If this seems counter intuitive to you, rest assured you are not alone. I've been following the "audio" and "FFT" tags (among others) on stack overflow and it's clear that many people attempt to implement EQs in the frequency domain, only to find that they run into a variety of problems.

## Frequency Domain Filters

Let's say you want to eliminate or reduce high frequencies from your signal. This is called a "low-pass" filter, or, less commonly, a "high-cut" filter. In the frequency domain, high frequencies get "sorted" into designated "bins", where you can manipulate them or even set them to zero. This seems like an ideal way to do low-pass filtering, but lets explore the process to see why it might not work out so well.Our first attempt at a low-pass filter, implemented with the FFT might look something like this:

- loop on audio input
- if enough audio is received, perform FFT, which gives us audio in the frequency domain
- in frequency domain, perform manipulations we want. In the case of eliminating high frequencies, we set the bins representing high frequencies to 0.
- perform inverse FFT, to get audio back in time domain
- output that chunk of audio

But there are quite a few problems with that approach:

- We must wait for a chunk of audio before we can even begin processing, which means that we will incur latency in our processing. The higher quality filter we want, the more audio we need to wait for. If the input buffer size does not match the FFT size, extra buffering needs to be done.
- The FFT, though efficient compared to the DFT (which is the FFT without the "fast" part), performs worse than linear time, and we need to do both the FFT and it's inverse, which is computationally similar. EQing with the FFT is therefore generally very inefficient compared to comparable time-domain filters.
- Because our output chunk has been processed in the frequency domain independent of samples in neighboring chunks, the audio in neighboring chunks may not be continuous. One solution is to process the entire file as one chunk (which only works for offline, rather than real-time processing, and is computationally expensive). The better solution is the OLA or Overlap Add method but this involves complexity that many people miss when implementing a filter this way.
- Filters implemented via FFT, as well as time-domain filters implemented via IFFT, often do not perform the way people expect. For example, many people expect that if they set all values in bins above a certain frequency to 0, then all frequencies above the given frequency will be eliminated. This is not the case. Instead, frequency responses
*at*the bin values will be 0, but the frequency response*between*those values is free to fluctuate -- and it does fluctuate, often greatly. This fluctuation is called "ripple." There are techniques for reducing ripple but they are complex, and they don't eliminate ripple. Note that, in general, frequencies across the entire spectrum are subject to ripple, so even just manipulating a small frequency band many create ripple across the entire frequency spectrum. - FFT filters suffer from so-called "pre-echo", where the sounds can be heard before the main sound hits. In and of itself, this isn't really a problem, but sounds are "smeared" so badly by many designs, that many in the audio world feel that these filters can effect the impact of transients and stereo imaging if not implemented and used correctly.

As a side note, one case where it might be worth all that work is a special case of so-called FIR filters (also sometimes called "Linear phase" filters). These are used sometimes in audio production and in other cases. In audio, they are usually used only in mastering because of their high latency and computational cost, but even then, many engineers don't like them (while others swear by them). FIR filters are best implemented in the time domain, as well, until the number of "taps"in the filter becomes enormous, which it sometimes does, and it actually becomes more efficient to implement using an FFT with OLA. FIR filters suffer from many of the problems mentioned above including pre-echo, high computational cost and latency, but they do have some acoustical properties that make them desirable in some applications.

## Time Domain Filters

Let's try removing high frequencies in the time domain instead. In the time domain, high frequencies are represented by the parts of the signal that change quickly, and low frequencies are represented as the parts that change slowly. One simple way to remove high frequencies, then, would be to use a moving average filter:y(n) = { x(n) + x(n-1) + .... + x(n-M) } / (M+1)

where x(i) is your input sample at time i, and y(i) is your output sample at time i. No FFT required for that (This is not the best filter for removing high frequencies -- in fact we can do WAY better -- but it is my favorite way to illustrate the point. The moving average filter is not uncommon in economics, image processing and other fields partly for this reason.). Several advantages are immediately obvious, and some are not so obvious:

- Each input sample can be processed one at a time to produce one output sample without having to chunk or wait for more audio. Therefore, there are also no continuity issues and minimal latency.
- It is extremely efficient, with only a few multiplies, adds and memory stores/retrievals required per sample.
- These filters can be designed to closely mimic analog filters.

In the end, the general rule is that for a given performance, you can get much better results with the time-domain than the frequency domain.

Great article. One additional reason not to use FFT for filtering is that, especially for short windows/low latency, the IFFT output sounds weird and jangly - even when no transformation is done to the FFT data. (Longer windows attenuate the jangles but will smear the sound.) For speech, this may not be a big deal, but for music it's usually a very ugly sound.

ReplyDeleteThis is an excellent point. The "weird" and "jangly" sound you mention, sometimes described as "hallow" or "phasy", and reminiscent of early and low quality bitrate compression, is due to something I didn't get into: overlapping FFT windows, which, in this design, would be necessary to eliminate edge effects of the windows themselves. The smearing effect comes from this as well, but can also come from pre- (and post-) echo. This is an issue that comes up in pitch shifters as well, where an art has developed around compensating for these effects, but it is not perfect.

DeleteThat's interesting. I'd always assumed that the sound had more to do with resynthesizing the sound using too few sine-waves, i.e. poor resolution in frequency space that fails to adequately characterize noisy sounds - though, now that i think of it, the artifacts are most apparent in the higher frequencies where the partials are "closer together" (in log-space).

DeleteBut you say it's an artifact of overlapping windows? Is it some kind of unintended constructive interference? I can't quite imagine how overlapping windows would cause this sound. In my experience (in experimental music, where artifacts are sometimes desirable, especially smearing), very large windows (half a second or longer) make the problem go away, which would be explained if poor resolution were the culprit. If that's not the reason, then why do large windows make the problem go away?

for fun: If you want to hear one example of how large windows can make neat, smeared sounds, check out paulstretch: http://hypermammut.sourceforge.net/paulstretch/ (plug in your favorite classical music for instant drones)

There's more than one thing going on here: the windows themselves create artefacts, and putting them back together does not always fix it, so it depends no only on the size of the window, but also the shape and overlap. Also, if the data is altered in the frequency domain, it may have moved in time as well. There may be even more going on -- I usually only deal with time domain filters :)

DeleteI don't think it's correct to say that poor synthesis resolution is the culprit, however. In fact, if you ignore the issue of numerical error, no data is lost when you transform data from time to frequency domain or vice versa, so even if your windows are very small, all your data is still in tact.

To put it another way, an EQ, even one built this way, isn't doing what I would call synthesis or re-synthesis. It might contain "analysis" and "synthesis" components, but it's not really taking apart a sound and reconstructing it using newly synthesized information the way, say, a vocoder is. It may be a matter of semantics, but mathematically, we are just changing the representation from the time domain to the Fourier domain, doing some manipulations, and converting back. The time domain and Fourier domain are equivalent in terms of the amount of information they carry, whereas re-synthesis methods generally reduce the amount of information durring the analysis stage.

i was wondering how we can remove frequencies in between, ie implement a band stop filter using your time domain filtering approach. Could you explain that point? Thanks!

ReplyDeleteMy next post contains a tutorial on building basic audio eqs: http://blog.bjornroche.com/2012/08/basic-audio-eqs.html I implement a bell filter, but using the linked "cookbook", you can easilly modify it to produce a bandstop filter.

DeleteI'm a little confused by what you've said here. You make it sound like frequency domain filters and time domain filters are two totally different things. You present moving time averages as though it's somehow completely different than a FIR filter, but it's actually just a specific FIR filter where all the taps are equally weighted. The time and frequency domains are just two perspectives on the same thing after all.

ReplyDeleteYou are right that I gloss over some facts to keep this more digestible for people who do not have a theoretical background. It's true, for example, that the time and frequency domains are two perspectives on the same thing. However, it's not so simple to say that since both time and frequency domains are two perspectives on the same thing we can use either one equally well, and that's what this post is about: implementation rather than theory. For example, one major problem with the frequency domain is causality: we need to know future samples in order to compute a complete frequency domain representation of the signal, and we need that in order to filter correctly. We don't have that problem in the time domain. In the true frequency domain, we need all the samples forever into the future and past, which is unfeasible for all but the shortest signals. This is why the first attempt described above breaks the audio into chunks. Strictly speaking, this is then the time-frequency domain, not the frequency domain. The time-frequency domain is the real-world compromise we must make if we wish to process long or real-time signals in the frequency domain and it is fraught with the problems I describe above. Of course, it can be done and there are many many applications for the time-frequency domain.

DeleteAs to the moving average filter, you are right that it is an FIR filter. I use it as a simple example of a time-domain filter, not as FIR or non-FIR. It happens to be FIR.