Matt Borden Jardine

Rust Convolution VST

github.com/BordenJardine/reverb_vst

I was looking to put some of my theoretical DSP knowledge to practical use and dip my toes into Rust at the same time. I came up with this relatively simple stereo convolution spring reverb VST. It uses the VST crate to plug into a DAW and convolves incoming audio with an impulse response I created by 'exciting' the spring of a eurorack reverb module with a fork. I made that module too, you can read about it here.

This is how it sounds.

It might not be the prettiest reverb, but I think it could be musically useful for the kind of stuff I like to make.

The Convolution Algorithm:


Convolution, at least in the audio world, is the process of combining spectral elements of multiple signals. The algorithm is pretty simple. It's basically combining the signals by taking their dot product. In plain english: for each sample in the original signal, create a new version of the second signal (called the ‘impulse response’ or ‘IR’) by multiplying each sample in the impulse response by that sample from the original signal. This will result in several new signals: one for each sample in the original signal. If you layer these signals back together accounting for the time offset of each original sample, the result is a convolution of the two signals.

This is the sample I used for the impulse response. Again, It's basically me hitting a mic'd up spring with a fork.

At first I was surprised that such a simple algorithm works. After thinking about it a little, it makes plenty of intuitive sense. Multiplying the values from the two signals has the effect of amplifying the common frequencies and attenuating the disparate ones.

There are, however, a few issues with this approach if you want to make a VST.

Problems:


1. Performance

The first issue is performance. VSTs need to act in real time and the process described above requires way too much computation for that. How can we make it faster?

2. IR Length

The second issue has to do with the audio buffers provided by the VST host. Generally these buffers are 1024 samples or fewer. Pretty much any impulse response that is useful for reverb will be much (much) larger than that. For instance, the one used in this VST is about 80kb. How do we convolve the two signals if they are wildly different sizes in real time?

3. Convolution Tail

The third issue is that the input and output buffers of a VST must be the same length. The convolution algorithm, however, produces an output signal that is longer than the input signal. Why is this so? If an entire copy of the Impulse Response is added to the output signal for each sample of the input signal, then consider what happens for the very last sample of the buffer that is passed into the VST. The entirety of the IR signal has to be added to the output signal starting at the last index of the input signal. Put another way, the output signal’s length will always be the length of the input signal plus the length of the impulse response. What do we do with this leftover section of the output?

Solutions:


1. Time → Frequency

The solution to the performance problem is a little complicated but, thankfully, well documented. The signals we have been working with so far are chronologically ordered lists of samples. This means we have been working in the time domain. It turns out convolution requires far fewer operations in the frequency domain. I’m not going to get into the details of why and how this works so, to cut to the chase, we can get real time convolution by following these steps:

  1. perform a Fast Fourier Transform (FFT) on both the incoming signal and the Impulse Response to convert them to the frequency domain
  2. convolve the resulting frequency spectra by doing some simple point multiplication
  3. perform an inverse FFT on the product to get it back into the time domain

2. Frame Queue

I couldn’t readily Google up a solution to the IR length problem, so let’s slow down and think about how reverb sounds for a second. When something sounds ‘reverby’ it generally trails off in volume gradually over time. Put another way, reverb has a ‘tail’. When you are listening to a continuous stream of audio with reverb applied the reverb-y tail you are hearing at any given moment is mixed with audio from a little while ago, not only with audio from this very moment. The deeper into the reverb’s tail you are hearing, the older the original sound it is convolved with must be. So the solution we are looking for must involve holding onto some sort of history of the input audio.

This is what I came up with. Deal with the large size of the IR by splitting it up into a list of audio-buffer-sized frames. Keep a history of previous audio buffers in a queue that is the same length as the IR segment list. As new input buffers come in, push them into the queue and pop off of the end, effectively aging out the oldest frames. Each time an audio buffer is processed, convolve the list of IR frames and the list of previous output frames piecewise. Layering these convolved signals together will give us the output for the current buffer. All of this, by the way, takes place in the frequency domain so ‘convolution’ is multiplication and ‘layering’ is addition.

3. Overlap Add

The fix for the third problem -- the issue with the leftover tail end of the convolved signal -- is well documented. After converting the output back into the time domain, chop off anything longer than the buffer length and save it to be layered onto the next frame.

Code:


This project leans on a few crates:

VST - Handles most of the VST implementation, allowing us to focus on the DSP part.

Rust FFT - Handles FFT and inverse FFT. I was not interested in reinventing the FFT wheel.

Hound - Reads and writes wav files. I used it here to load in the spring impulse response.

Much of the non-dsp-related code in the repo is cribbed more or less straight from apli-Fe. Feel free to look around in my github if you want, but it's mostly VST boilerplate and that sort of stuff isn’t going to be the focus of this writeup. You’d be better served by looking at the VST crate documentation (which is what I did).

The VST crate and the ampli-Fe example got me to the point where I had access to input and output buffers in the form of f32 arrays. All of the novel and interesting DSP stuff happens after that point, in convolution.rs.

I usually like to start by setting up a general outline for what I want to do. Let’s make a Convolver type with a ‘process’ function that takes the input buffer as an argument and returns the output buffer.

pub struct Convolver { } impl Convolver { pub fn process(&mut self, input_buffer: &[f32]) -> Vec<f32> { } }

Amazing.

Let’s start thinking about what sort of state a Convolver will need to hold onto for its lifetime. We want to store the impulse response as a list of segments, the queue of previous frames, the ‘tail’ of the previous convolution to overlap onto the current output, and in this case we are going to hold onto some of the machinery to do the FFT and iFFT. So let’s add some imports, and set up our struct.

use std::collections::VecDeque; use std::sync::Arc; use rustfft::{Fft, FftPlanner, num_complex::Complex}; const FFT_SIZE: usize = 1024; pub struct Convolver { ir_segments: Vec<Vec<Complex<f32>>>, // freq domain impulse response segments previous_frame_q: VecDeque<Vec<Complex<f32>>>, // previous freq domain input signals previous_tail: Vec<f32>, // previous output frame (time domain) for overlap add fft_processor: Arc<dyn Fft<f32>>, ifft_processor: Arc<dyn Fft<f32>>, //inverse ff }

Now lets give the type a constructor where we will initialize the FFT processors, split up the IR into segments, and initialize the queue of previous frames like we discussed earlier.

impl Convolver { // set up saved segmented IR pub fn new(ir_signal: &[f32]) -> Self { let mut planner = FftPlanner::<f32>::new(); let fft_processor = planner.plan_fft_forward(FFT_SIZE); let ifft_processor = planner.plan_fft_inverse(FFT_SIZE); let ir_segments = segment_buffer(ir_signal, FFT_SIZE, &fft_processor); let segment_count = ir_segments.len(); Self { ir_segments, fft_processor, ifft_processor, previous_frame_q: init_previous_frame_q(segment_count, FFT_SIZE), previous_tail: init_previous_tail(FFT_SIZE/2), } } pub fn process(&mut self, input_buffer: &[f32]) -> Vec<f32> { } } pub fn init_previous_tail(size: usize) -> Vec<f32> { let mut tail = Vec::new(); for _ in 0..size { tail.push(0.); } tail } // - segment buffer (pad with 0s to be fft_size) // - FFT and hold onto each segment pub fn segment_buffer(buffer: &[f32], fft_processor: &Arc<dyn Fft<f32>>) -> Vec<Vec<Complex<f32>>> { let mut segments = Vec::new(); let segment_size = FFT_SIZE / 2; let mut index = 0; while index < buffer.len() { let mut new_segment: Vec<Complex<f32>> = Vec::new(); for i in index..index+segment_size { match buffer.get(i) { Some(sample) => new_segment.push(Complex { re: *sample, im: 0. }), None => continue } } while new_segment.len() < FFT_SIZE { new_segment.push(Complex { re: 0., im: 0. }); } fft_processor.process(&mut new_segment); segments.push(new_segment); index += segment_size; } segments } // queue of previous input segments in the frequency domain (polar notation) // init to 0s pub fn init_previous_frame_q(segment_count: usize) -> VecDeque<Vec<Complex<f32>>> { let mut q = VecDeque::new(); for _ in 0..segment_count { let mut empty = Vec::new(); for _ in 0..FFT_SIZE { empty.push(Complex{ re: 0., im: 0. }); } q.push_back(empty); } q }

Forgive me if this part of the writeup is a little bit ‘draw the rest of the owl’, but if you’ve read the theory part above I hope the code here will make some sense.

Now that we have our convolver initialized, let’s start working on the ‘process’ function that will be called by the VST host to process audio in real time. We want to convert the input buffer from the time domain to the frequency domain with the FFT processor, push the new segment onto the history, convolve the signals as described in the ‘frame queue’ section above, convert them back into the time domain, and then perform the overlap add with the previous run’s leftover tail (if there is one).

// within the Convolver type pub fn process(&mut self, input_buffer: &[f32]) -> Vec<f32> { let io_len = input_buffer.len(); // segment and convert to freq domain let input_segments = segment_buffer(input_buffer, FFT_SIZE, &self.fft_processor); let mut output_segments: Vec<Vec<Complex<f32>>> = Vec::new(); // advance the frame queue and convolve a frame for segment in input_segments { self.previous_frame_q.push_front(segment); self.previous_frame_q.pop_back(); output_segments.push(self.convolve_frame()); } // convert back to time domain let mut time_domain: Vec<f32> = Vec::new(); for mut segment in output_segments { self.ifft_processor.process(&mut segment); for sample in segment { time_domain.push(sample.re); } } // overlap add previous tail for (i, sample) in self.previous_tail.iter().enumerate() { match time_domain.get_mut(i) { Some(out_sample) => *out_sample += sample, None => break } } // everything outside of the buffer length is the tail for the next run self.previous_tail = time_domain[io_len..time_domain.len()].to_vec(); // return a buffer’s worth of signal return time_domain[0..io_len].to_vec(); }

self.convolve_frame() is doing a lot of heavy lifting here, so let’s take a look at what’s going on in there. For each segment of the IR signal, we convolve it with the segment at the same index in the history queue. Keep in mind that later parts of the history queue are older input buffers, and will therefore be convolved with later sections of the impulse response.

// in freq domain // 𝑌𝑛(𝑧)=𝑋𝑛(𝑧)⋅𝐻0(𝑧)+𝑋𝑛−1(𝑧)⋅𝐻1(𝑧)+...+𝑋𝑛−15(𝑧)⋅𝐻15(𝑧) fn convolve_frame(&mut self) -> Vec<Complex<f32>> { //init output to accumulate onto let mut convolved: Vec<Complex<f32>> = Vec::new(); for _ in 0..FFT_SIZE { convolved.push(Complex {re: 0. , im: 0. }); } for i in 0..self.ir_segments.len() { add_frames(&mut convolved, mult_frames( &self.previous_frame_q[i], &self.ir_segments[i] )); } convolved } // frequency domain (complex numbers) addition // mutates f1! pub fn add_frames(f1: &mut [Complex<f32>], f2: Vec<Complex<f32>>) { for (mut sample1, sample2) in f1.iter_mut().zip(f2) { sample1.re = sample1.re + sample2.re; sample1.im = sample1.im + sample2.im; } } //freq domain (complex numbers) multiplication //ReY[f] = ReX[f]ReH[f]-ImX[f]ImH[f] //ImY[f] = ImX[f]ReH[f] + ReX[f]ImH[f] // // returns new vec pub fn mult_frames(f1: &[Complex<f32>], f2: &[Complex<f32>]) -> Vec<Complex<f32>> { let mut out: Vec<Complex<f32>> = Vec::new(); for (sample1, sample2) in f1.iter().zip(f2) { out.push(Complex { re: (sample1.re * sample2.re) - (sample1.im * sample2.im), im: (sample1.im * sample2.re) - (sample1.re * sample2.im) }); } out }

That’s it! A perfectly useful convolution algorithm implemented in rust. Let's see it all together.

use std::collections::VecDeque; use std::sync::Arc; use rustfft::{Fft, FftPlanner, num_complex::Complex}; const FFT_SIZE: usize = 1024; pub struct Convolver { ir_segments: Vec<Vec<Complex<f32>>>, // freq domain impulse response segments previous_frame_q: VecDeque<Vec<Complex<f32>>>, // previous freq domain input signals previous_tail: Vec<f32>, // previous output frame (time domain) for overlap add fft_processor: Arc<dyn Fft<f32>>, ifft_processor: Arc<dyn Fft<f32>>, //inverse ff } impl Convolver { // set up saved segmented IR pub fn new(ir_signal: &[f32]) -> Self { let mut planner = FftPlanner::<f32>::new(); let fft_processor = planner.plan_fft_forward(FFT_SIZE); let ifft_processor = planner.plan_fft_inverse(FFT_SIZE); let ir_segments = segment_buffer(ir_signal, FFT_SIZE, &fft_processor); let segment_count = ir_segments.len(); Self { ir_segments, fft_processor, ifft_processor, previous_frame_q: init_previous_frame_q(segment_count, FFT_SIZE), previous_tail: init_previous_tail(FFT_SIZE/2), } } pub fn process(&mut self, input_buffer: &[f32]) -> Vec<f32> { let io_len = input_buffer.len(); // segment and convert to freq domain let input_segments = segment_buffer(input_buffer, FFT_SIZE, &self.fft_processor); let mut output_segments: Vec<Vec<Complex<f32>>> = Vec::new(); // push front/ pop back for segment in input_segments { self.previous_frame_q.push_front(segment); self.previous_frame_q.pop_back(); // multiply output_segments.push(self.convolve_frame()); } // go back to time domain let mut time_domain: Vec<f32> = Vec::new(); for mut segment in output_segments { self.ifft_processor.process(&mut segment); for sample in segment { time_domain.push(sample.re); } } // overlap add for (i, sample) in self.previous_tail.iter().enumerate() { match time_domain.get_mut(i) { Some(out_sample) => *out_sample += sample, None => break } } // everything outside of the buffer length is the tail for the next run self.previous_tail = time_domain[io_len..time_domain.len()].to_vec(); // return a buffers worth of signal return time_domain[0..io_len].to_vec(); } // in freq domain // 𝑌𝑛(𝑧)=𝑋𝑛(𝑧)⋅𝐻0(𝑧)+𝑋𝑛−1(𝑧)⋅𝐻1(𝑧)+...+𝑋𝑛−15(𝑧)⋅𝐻15(𝑧) fn convolve_frame(&mut self) -> Vec<Complex<f32>> { //init output to accumulate onto let mut convolved: Vec<Complex<f32>> = Vec::new(); for _ in 0..FFT_SIZE { convolved.push(Complex {re: 0. , im: 0. }); } for i in 0..self.ir_segments.len() { add_frames(&mut convolved, mult_frames( &self.previous_frame_q[i], &self.ir_segments[i] )); } convolved } } // mutates the first frame! pub fn add_frames(f1: &mut [Complex<f32>], f2: Vec<Complex<f32>>) { for (mut sample1, sample2) in f1.iter_mut().zip(f2) { sample1.re = sample1.re + sample2.re; sample1.im = sample1.im + sample2.im; } } //freq domain multiplication //ReY[f] = ReX[f]ReH[f]-ImX[f]ImH[f] //ImY[f] = ImX[f]ReH[f] + ReX[f]ImH[f] // // returns new vec pub fn mult_frames(f1: &[Complex<f32>], f2: &[Complex<f32>]) -> Vec<Complex<f32>> { let mut out: Vec<Complex<f32>> = Vec::new(); for (sample1, sample2) in f1.iter().zip(f2) { out.push(Complex { re: (sample1.re * sample2.re) - (sample1.im * sample2.im), im: (sample1.im * sample2.re) - (sample1.re * sample2.im) }); } out } pub fn init_previous_tail(size: usize) -> Vec<f32> { let mut tail = Vec::new(); for _ in 0..size { tail.push(0.); } tail } // - segment buffer (pad with 0s to be fft_size) // - FFT and hold onto each segment pub fn segment_buffer(buffer: &[f32], fft_processor: &Arc<dyn Fft<f32>>) -> Vec<Vec<Complex<f32>>> { let mut segments = Vec::new(); let segment_size = FFT_SIZE / 2; let mut index = 0; while index < buffer.len() { let mut new_segment: Vec<Complex<f32>> = Vec::new(); for i in index..index+segment_size { match buffer.get(i) { Some(sample) => new_segment.push(Complex { re: *sample, im: 0. }), None => continue } } while new_segment.len() < FFT_SIZE { new_segment.push(Complex { re: 0., im: 0. }); } fft_processor.process(&mut new_segment); segments.push(new_segment); index += segment_size; } segments } // queue of previous input segments in the frequency domain (polar notation) // init to 0s pub fn init_previous_frame_q(segment_count: usize) -> VecDeque<Vec<Complex<f32>>> { let mut q = VecDeque::new(); for _ in 0..segment_count { let mut empty = Vec::new(); for _ in 0..FFT_SIZE { empty.push(Complex{ re: 0., im: 0. }); } q.push_back(empty); } q }

You can download the finished VST if you like. Keep in mind it doesn't have a UI of its own, so you will need to use a DAW that can provide one.

Please do get in touch with me if you have any questions (or advice!)

Matt Borden Jardine