This repository contains a collection of Haskell libraries implementing a phase vocoder - an algorithm for processing time-varying signals in frequency domain. It consists of the following libraries:
vocoderis the core library. It implements the algorithms in the form of pure functions. It has minimal dependencies -- it depends only onvectorandvector-fftw. It can be used directly, but for most uses it's preferrable to use one of the abstractions provided by other libraries.vocoder-conduitwraps the algorithms provided byvocoderfor use with the conduit streaming library. This library is suitable for both off-line and on-line processing, including time transformations (e.g. time stretching).vocoder-audiois a convenience library which allows to easily usevocoder-conduitwith audio streams provided byconduit-audio.vocoder-dunaiwraps the algorithms provided byvocoderfor use with dunai and rhine FRP libraries. This library is suitable for real-time signal processing.
The phase vocoder algorithm consists of several stages:
- Splitting the input signal into overlapping frames of certain size, and applying a window function.
- Computing the FFT of the frames.
- Analysing the phases, which involves comparing the phases of the current frame to the previous one, and computing phase increments.
- Processing of the spectrum. This is the part which depends on the application. This phase is also responsible for the correctness of phase increments after the processing.
- Synthesizing the phases, which involves accumulating the phase increments.
- Computing the IFFT.
- Combining the overlapping frames into the output signal.
The analysis and synthesis phases are implemented in the vocoder library.
The convenience libraries vocoder-conduit and vocoder-dunai make it easy to write the application-specific processing stage and to use it to create the full vocoder pipeline.
The processing function has the type of Monad m => ConduitT STFTFrame STFTFrame m r, which means it is a conduit that accepts STFT frames (consisting of amplitudes and phase increments) and produces STFT frames.
To use it, the simplest way is to use the process function, of the type Monad m => VocoderParams -> ConduitT STFTFrame STFTFrame m r -> ConduitT Frame Frame m r.
The result is a conduit, which accepts a chunked stream of input data, and produces a chunked stream of output data.
For more complex usage, there is the processFrames function, of the type Monad m => VocoderParams -> (Phase, Phase) -> ConduitT STFTFrame STFTFrame m r -> ConduitT Frame Frame m (r, (Phase, Phase)).
Its arguments are: the vocoder configuration, the initial phases (for analysis and synthesis), and the processing function.
The result is a conduit, which accepts overlapped, windowed signal frames (of arbitrary size), produces overlapped signal frames (of the same size as the vocoder's hop size), and returns the result of the processing function and the final phases (for analysis and synthesis).
Initial phases can be constructed using the function zeroPhase.
The returned final phases can be used to seamlessly continue processing using a different processing function.
To use this function with chunked streams, framesOfE and sumFramesE can be used.
Because the conduit abstraction doesn't allow zipping, the library includes additional functions for simulteaneously processing multiple channels (e.g. stereo).
To do that, one should use the processFrames function, of the type (Monad m, Applicative f) => (f Phase, f Phase) -> ConduitT (f STFTFrame) (f STFTFrame) m r -> ConduitT (f Frame) (f Frame) m (r, (f Phase, f Phase)).
The applicative functor f should be a data structure for the channels, with a ,,zipping'' (field-wise) applicative instance.
Therefore using lists as f is not correct (but ZipList is).
The processing function has the type of MSF m [STFTFrame] [STFTFrame], which means it accepts sequences of STFT frames and produces sequences of STFT frames (of the same length).
The need for using sequences comes from the fact that processing in dunai is synchronous: for every input frame, exactly one output frame has to be produced, and every part of the processing chain works at the same rate.
Therefore, for hop sizes smaller than the input frame size, several STFT frames will be produced for one time-domain input frame.
To use the processing function, the simplest way is to use the process function, of the type MSF m [STFTFrame] [STFTFrame] -> MSF m Frame Frame.
Two example programs are provided to demonstrate the use of the libraries: vocoder-file in package vocoder-audio, and vocoder-jack in package vocoder-dunai.
This program uses vocoder-conduit for off-line audio file processing.
It takes a number of input files and effect specifications for each of the files, and produces an output file which is the result of processing the input files in sequence.
The inputs are joined seamlessly.
Example usage:
-
vocoder-file output.wav input.wav --lowpassBrickwall 1000Low-pass filters the
input.wavfile, leaving only the frequencies below 1000 Hz. -
vocoder-file output.wav input1.wav --pitchShiftInterpolate 2 input2.wav --playSpeed 2Changes the pitch of
input1.wavby one octave, and slows down the tempo ofinput2.wavtwo times. -
vocoder-file output.wav input.wav --playSpeed 10 --randomPhaseSlows down the tempo of
input.wavten times and randomises the STFT phases, introducing Paulstretch-style effect. -
vocoder-file output.wav input.wav --envelope 32 --randomPhaseTakes spectral envelope of
input.wavand randomises the STFT phases, which makes speech sound more like a whisper.
This program uses vocoder-dunai for processing audio in real-time through the JACK real-time audio framework.
It allows combining multiple sound inputs using a filter tree, which allows constructing interesting effects, including those known from vocoder pedals.
The filter tree is specified using postfix notation. If a filter argument is missing, input stream 0 is assumed.
Example usage:
-
vocoder-jack --lowpassButterworth 2,1000Low-pass filters the input using a filter with second-order Butterworth-like characteristics and a cut-off frequency of 1000 Hz.
-
vocoder-jack --windowSize 128 --randomPhaseRandomizes STFT phase with small window size, reducing sound clarity and creating a "distorted anonymous voice" effect.
-
vocoder-jack --envelope 32 --randomPhaseTake spectral envelope of the input and randomize the STFT phases, producing a whisper-like effect.
-
vocoder-jack --pitchShiftInterpolate 2 --addAdds the spectrum of the input signal to the spectrum lowered by one octave, creating a harmonizer effect.
-
vocoder-jack --envelope 32 --divide --source 1 --envelope 32 --multiplyCreates a talkbox-like vocoder effect. The first input's spectrum is divided by its own envelope, and multiplied by the envelope of the second input's spectrum.
-
vocoder-jack --delay 10 --amplify 0.5 --addCreate an echo effect by delaying the input by 10 STFT hops, reducing amplitude and adding to the original signal.