Tuesday, November 27, 2012

Audio IIR v FIR EQs


Digital filters come in two flavors: IIR (or "Infinite Impulse Response") and FIR (or "Finite Impulse Response"). Those complex acronyms may confuse you, so let's shed a little light on the situation by defining both and explaining the differences.

Some people are interested in which is better. Unfortunately, as with many things, there is no easy answer to that question, other than "it depends", and sometimes what it depends on is your ears. I won't stray too deep into field of opinions, but I will try to mention why some people claim one is better than the other and what some of the advantages and disadvantages are in different situations.

How Filters Work

When you design a filter, you start with a set of specifications. To audio engineers, this might be a bit vague, like "boost 1 kHz by 3 dB", but electrical engineers are usually trained to design filters with very specific constraints. However you start, there's usually some long set of equations, and rules used to "design" the filter, depending on what type of filter you are designing and what the specific constraints are (to see one way you might design a filter, see this post on audio eq design). Once the filter is "designed" you can actually process audio samples.

IIR Filters

Once the filter is designed, the filter itself is implemented as difference equations, like this:

    y[i] = a0 * x[i] + a1 * x[i-1] ... + an * x[i-n] - b1 * y[i-1] ... - bm * y[i-m].

In this case, y is an array storing the output, and x is an array storing the input. Note that each output is a linear function of previous inputs and outputs, as well as the current input.

In order to know the current value of y, we need to know the last value of y, and to know that, you must know the value of still earlier values of y, and so on, all the way back until we reach our initial conditions. For this reason, this kind of filter is sometimes called a "recursive" filter. In principle, this filter can be given a finite input, and it will produce output forever. Because its response is infinite, we call this filter an IIR, or "Infinite Impulse Response" filter.

(To further confuse the terminology, IIR filters are often designed with certain constraints that make them "minimum phase." While IIR filters are not all minimum phase, many people use the terms "recursive", "IIR" and "minimum phase" interchangeably.)

Digital IIR filters are often modeled after analog filters. In many ways, analog-modled IIR filters sound like analog filters. They are very efficient, too: for audio purposes, they usually only require a few multiplies.

FIR Filters

FIR filters, on the other hand, are usually implemented with a difference equation that looks like this:

    y[i] = a0 * x[i] + a1 * x[i-1]  a2 * x[i-2] + ... an * x[i-n] + an * x[i-n-1] + ... + a1 * x[2i+1] + a0 * x[2i]

In this case, we don't use previous outputs: in order to calculate the current output, we only need to know the previous n inputs. This may improve the numerical stability of the filter because roundoff errors are not accumulated inside the filter. However, generally speaking, FIR filters are much more CPU intensive for a comparable response, and have some other problems, such as high latency, and both pass-band and stop-band ripple.

If an FIR filter can be implemented using a difference equation that is symmetrical, like the one above, it has a special property called "linear phase." Linear phase filters delay all frequencies in the signal by the same amount, which is not possible with IIR filters.

Which Filter?

When deciding which filter to use, there are many things to take into account. Here are some of those things:

  • Some people feel that linear phase FIR filters sound more natural and have fewer "artifacts".
  • FIR filters are usually much more processor intensive for the same response.
  • FIR filters have "ripple" in both the passband and stopband, meaning the response is "jumpy". IIR filters can be designed without any ripple.
  • IIR filters can be easily designed to sound like analog filters.
  • IIR filters require careful design to ensure stability and good numerical error properties, however, that art is fairly advanced.
  • FIR filters generally have a higher latency.

Thursday, August 23, 2012

Basic Audio EQs

In my last post, I looked at why it's usually better to do EQ (or filtering) in the time domain than the frequency domain as far as audio is concerned, but I didn't spend much time explaining how you might implement a time-domain EQ. That's what I'm going to do now.

The theory behind time-domain filters could fill a book. Instead of trying to cram you full of theory we'll just skip ahead to what you need to know to do it. I'll assume you already have some idea of what a filter is.

Audio EQ Cookbook

The Audio EQ Cookbook by Robert Bristow-Johnson is a great, albeit very terse, description of how to build basic audio EQs. These EQs can be described as second order digital filters, sometimes called "biquads"because the equation that describes them contains two quadratics. In audio, we sometimes use other kinds of filters, but second order filters are a real workhorse. First order filters don't do much: they generally just allow us to adjust the overall balance of high and low frequencies. This can be useful in "tone control" circuits, like you might find on some stereos and guitars, but not much else. Second order filters give us more control -- we can "dial in" a specific frequency, or increase or decrease frequencies above and below a certain threshold, with a fair degree of accuracy, for example. If we need even more control than a second order filter offers, we can often simply take several second order filters and place them in series to simulate the effect of a single higher order filter.

Notice I said series, though. Don't try putting these filters in parallel, because they not only alter the frequency response, but also the phase response, so when you put them in parallel you might get unexpected results. For example, if you take a so-called all-pass filter and put it in parallel with no filter, the result will not be a flat frequency response, even though you've combined the output of two signals that have the same frequency response as the original signal.

Using the Audio EQ Cookbook, we can design a peaking, high-pass, low-pass, band-pass, notch (or band-stop), or shelving filter. These are the basic filters used in audio. We can even design that crazy all-pass filter I mentioned which actually does come in handy if you are building a phaser. (It has other uses, too, but that's for another post.)

Bell Filter

Let's design a "bell", or "peaking" filter using RBJ's cookbook. Most other filters in the cookbook are either similar to the bell or simpler, so once you understand the bell, you're golden. To start with, you will need to know the sample rate of the audio going into and coming out of your filter, and the center frequency of your filter. The center frequency, in the case of the bell filter, is the frequency that "most affected" by your filter. You will also want to define the width of the filter, which can be done in a number of ways usually with some variation on "Q" or "quality factor" and "bandwidth". RBJ's filters define bandwidth in octaves, and you want to be careful that you don't extend the top of the bandwidth above the Niquist frequency (or 1/2 the sample rate), or your filter won't work. We also need to know how much of our center frequency to add in dB (if we want to remove, we just use a negative value, and for no change, we set that to 0).

Fs = Sample Rate
F0 = Center Frequency (always less than Fs/2)
BW = Bandwidth in octaves
g = gain in dB

Great! Now we are ready to begin our calculations. First, RJB suggests calculating some intermediate values:

A = 10^(g/40)
w0 = 2*pi*f0/Fs c = cos(w0) s = sin(w0) alpha = s*sinh( ln(2)/2 * BW * w0/s )

This is a great chance to use that hyperbolic sin button on your scientific calculator that, until now, has only been collecting dust. Now that we've done that, we can finally calculate the filter coefficients, which we use when actually processing data:

b0 = 1 + alpha*A b1 = -2*c b2 = 1 - alpha*A a0 = 1 + alpha/A a1 = -2*c a2 = 1 - alpha/A

Generally speaking, we want to "normalize" these coefficients, so that a0 = 1. We can do this by dividing each coefficient by a0. Do this in advance or the electrical engineers will laugh at you:

b0 /= a0 b1 /= a0 b2 /= a0 a1 /= a0 a2 /= a0

Now, in pseudocode, here's how we process our data, one sample at a time using a "process" function that looks something like this:

number xmem1, xmem2, ymem1, ymem2;

void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
number process( number x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;

xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;

return y;
}

You'll probably have some kind of loop that your process function goes in, since it will get called once for each audio sample.

There's actually more than one way to implement the process function given that particular set of coefficients. This implementation is called "Direct Form I" and happens to work pretty darn well most of the time. "Direct form II" has some admirers, but those people are either suffering from graduate-school-induced trauma or actually have some very good reason for doing what they are doing that in all likelihood does not apply to you. There are of course other implementations, but DFI is a good place to start.

You may have noticed that the output of the filter, y, is stored and used as an input to future iterations. The filter is therefore "recursive". This has several implications:

  • The filter is fairly sensitive to errors in the recursive values and coefficients. Because of this, we need to take care of what happens with the error in our y values. In practice, on computers, we usually just need to use a high resolution floating point value (ie double precision) to store these (on fixed point hardware, it is often another matter).
  • Another issue is that you can't just blindly set the values of your coefficients, or your filter may become unstable. Fortunately, the coefficients that come out of RJB's equations always result in stable filters, but don't go messing around. For example, you might be tempted to interpolate coefficients from one set of values to another to simulate a filter sweep. Resist this temptation or you will unleash the numerical fury of hell! The values in between will be "unstable" meaning that your output will run off to infinity. Madness, delirium, vomiting and broken speakers are often the unfortunate casualties.
  • On some platforms you will have to deal with something called "denormal" numbers. This is a major pain in the ass, I'm sorry to say. Basically it means our performance will be between 10 and 100 times worse than it should be because the CPU is busy calculating tiny numbers you don't care about. This is one of the rare cases where I would advocate optimizing before you measure a problem because sometimes your code moves around and it comes up and it's very hard to trace this issue. In this case, the easiest solution is probably to do something like this (imagine we are in C for a moment):


#DEFINE IS_DENORMAL(f) (((*(unsigned int *)&(f))&0x7f800000) == 0)
float xmem1, xmem2, ymem1, ymem2;

void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
float process( float x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;

if( IS_DENORMAL( y ) )
y = 0;

xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;

return y;
}

Okay, happy filtering!

Wednesday, August 8, 2012

Why EQ Is Done In the Time Domain

In my last post, I discussed how various audio processing may be best done in the frequency or time domain. Specifically, I suggested that EQ, which is a filter that alters the frequency balance of a signal, is best done in the time domain, not the frequency domain. (See my next post if you want to learn how to implement a time-domain filter.)

If this seems counter intuitive to you, rest assured you are not alone. I've been following the "audio" and "FFT" tags (among others) on stack overflow and it's clear that many people attempt to implement EQs in the frequency domain, only to find that they run into a variety of problems.

Frequency Domain Filters

Let's say you want to eliminate or reduce high frequencies from your signal. This is called a "low-pass" filter, or, less commonly, a "high-cut" filter. In the frequency domain, high frequencies get "sorted" into designated "bins", where you can manipulate them or even set them to zero. This seems like an ideal way to do low-pass filtering, but lets explore the process to see why it might not work out so well.

Our first attempt at a low-pass filter, implemented with the FFT might look something like this:
  • loop on audio input
  • if enough audio is received, perform FFT, which gives us audio in the frequency domain
    • in frequency domain, perform manipulations we want. In the case of eliminating high frequencies, we set the bins representing high frequencies to 0.
    • perform inverse FFT, to get audio back in time domain
    • output that chunk of audio

But there are quite a few problems with that approach:
  • We must wait for a chunk of audio before we can even begin processing, which means that we will incur latency in our processing. The higher quality filter we want, the more audio we need to wait for. If the input buffer size does not match the FFT size, extra buffering needs to be done.
  • The FFT, though efficient compared to the DFT (which is the FFT without the "fast" part), performs worse than linear time, and we need to do both the FFT and it's inverse, which is computationally similar. EQing with the FFT is therefore generally very inefficient compared to comparable time-domain filters.
  • Because our output chunk has been processed in the frequency domain independent of samples in neighboring chunks, the audio in neighboring chunks may not be continuous. One solution is to process the entire file as one chunk (which only works for offline, rather than real-time processing, and is computationally expensive). The better solution is the OLA or Overlap Add method but this involves complexity that many people miss when implementing a filter this way.
  • Filters implemented via FFT, as well as time-domain filters implemented via IFFT, often do not perform the way people expect. For example, many people expect that if they set all values in bins above a certain frequency to 0, then all frequencies above the given frequency will be eliminated. This is not the case. Instead, frequency responses at the bin values will be 0, but the frequency response between those values is free to fluctuate -- and it does fluctuate, often greatly. This fluctuation is called "ripple." There are techniques for reducing ripple but they are complex, and they don't eliminate ripple. Note that, in general, frequencies across the entire spectrum are subject to ripple, so even just manipulating a small frequency band many create ripple across the entire frequency spectrum.
  • FFT filters suffer from so-called "pre-echo", where the sounds can be heard before the main sound hits. In and of itself, this isn't really a problem, but sounds are "smeared" so badly by many designs, that many in the audio world feel that these filters can effect the impact of transients and stereo imaging if not implemented and used correctly.
So it's clear that FFT filters may not be right, or if they are, they involve much more complexity than many people first realize.

As a side note, one case where it might be worth all that work is a special case of so-called FIR filters (also sometimes called "Linear phase" filters). These are used sometimes in audio production and in other cases. In audio, they are usually used only in mastering because of their high latency and computational cost, but even then, many engineers don't like them (while others swear by them). FIR filters are best implemented in the time domain, as well, until the number of "taps"in the filter becomes enormous, which it sometimes does, and it actually becomes more efficient to implement using an FFT with OLA. FIR filters suffer from many of the problems mentioned above including pre-echo, high computational cost and latency, but they do have some acoustical properties that make them desirable in some applications.

Time Domain Filters

Let's try removing high frequencies in the time domain instead. In the time domain, high frequencies are represented by the parts of the signal that change quickly, and low frequencies are represented as the parts that change slowly. One simple way to remove high frequencies, then, would be to use a moving average filter:

y(n) = { x(n) + x(n-1) + .... + x(n-M) } / (M+1)

where x(i) is your input sample at time i, and y(i) is your output sample at time i. No FFT required for that (This is not the best filter for removing high frequencies -- in fact we can do WAY better -- but it is my favorite way to illustrate the point. The moving average filter is not uncommon in economics, image processing and other fields partly for this reason.). Several advantages are immediately obvious, and some are not so obvious:
  • Each input sample can be processed one at a time to produce one output sample without having to chunk or wait for more audio. Therefore, there are also no continuity issues and minimal latency.
  • It is extremely efficient, with only a few multiplies, adds and memory stores/retrievals required per sample.
  • These filters can be designed to closely mimic analog filters.
A major disadvantage is that it is not immediately obvious how to design a high-quality filter in the time domain. In fact, it can take some serious math to do so. It's also worth noting that many time-domain filters, like frequency domain filters, also suffer from ripple, but for many design methods, this ripple is well defined and can be limited in various ways.

In the end, the general rule is that for a given performance, you can get much better results with the time-domain than the frequency domain.

Saturday, August 4, 2012

When to (not) use the FFT

In the last post I discussed one use for the FFT: pitch tracking. I also mentioned that there were better ways to do pitch tracking. Indeed, aside from improvements on that method, you could also use entirely different methods that don't rely on the FFT at all.

The FFT transforms data into the "frequency domain", or, if your data is broken down into chunks, the FFT transforms it into the "time-frequency domain," which we often still think of as the frequency domain. However, the most basic "domain" you can work in is usually the "time domain." In the time domain, audio is represented as sequence of amplitude values. You may know this as "PCM" audio. This is what's usually stored in WAVs and AIFs, and when we access audio devices like soundcards, this is the most natural way to transfer data. It turns out we can also do a whole lot of processing and analysis in the time domain as well.

ProcessTime DomainFrequency Domain
Filtering/
EQ
Yes!No!
Pitch ShiftingOkayOkay
Pitch TrackingOkayOkay
Reverb
(Simulated)
Yes!No!
Reverb
(Impulse)
No!Yes!
Guitar effects
Chorus/flanger/distortion/etc
Yes!No!
SR ConversionYes!No!
CompressionYes!No!
Panning, Mixing, etcYes!No!
Table 1: Recommendations for Audio Processing in the Time Domain vs. the Frequency Domain


Wow, so impulse reverb is really the only thing on that list you need an FFT for? Actually even that can be done in the time domain, it's just much more efficient in the frequency domain (so much so that it might be considered impossible in the time domain).

You might wonder how to adjust the frequency balance of a signal, which is what an EQ does, in the time domain rather than the frequency domain. Well, you can do it in the frequency domain, but you are asking for trouble. I'll talk about this in my next post.

Sunday, July 22, 2012

Frequency detection using the FFT (aka pitch tracking) With Source Code

It's not necessarily as simple a it seems to find the pitch
from an FFT. Some pre-processing is required as well
as some knowledge of how the data is organized.
How to track pitch with the FFT seems to be a very commonly asked question on stack overflow. Many people seem to think tracking pitch is as simple as putting your data into an FFT, and looking at the result. Unfortunately, this is not the case. Simply applying an FFT to your input, even if you know what size FFT to use, is not going to give you optimal results, although it might work in some cases.

At the end of the day, using the FFT is not actually the best pitch tracking method available for tracking or detecting pitch of an audio signal. While it is possible to make a good pitch tracker using the FFT, doing it right requires a tremendous amount of work. The algorithm shown here works, and works pretty well, but if you need something that converges on the correct pitch really quickly, is very accurate, or tracks multiple notes simultaneously, you need something else.

Still, you can create a decent pitch tracking algorithm that's reasonably easy to understand using the FFT. It doesn't require too much work, and I've explained it and provided code, in the form of a command-line C guitar tuner app which you can get from github. It compiles and runs on Mac OS X and you should be able to get it to run on other platforms without much trouble. If you want to port to other languages, that shouldn't be too hard either. It's worth noting that I specifically designed this app to be similar to the tuner described by Craig A. Lindley in Digital Audio with Java, so if you are looking for Java source code, you can check out his code (although there are differences between hi code and mine).

The Big Picture

To do our pitch detection, we basically loop on the following steps:

  1. Read enough data to fill the FFT
  2. Low-pass the data
  3. Apply a window to the data
  4. Transform the data using the FFT
  5. Find the peak value in the transformed data
  6. Compute the peak frequency from from the index of the peak value in the transformed data

This is the main processing loop for the tuner, with some stuff left out:

   while( running )
   {
      // read some data
      err = Pa_ReadStream( stream, data, FFT_SIZE );

      // low-pass
      for( int j=0; j<FFT_SIZE; ++j ) {
         data[j] = processSecondOrderFilter( data[j], mem1, a, b );
         data[j] = processSecondOrderFilter( data[j], mem2, a, b );
      }
      // window
      applyWindow( window, data, FFT_SIZE );

      // do the fft
      for( int j=0; j
         datai[j] = 0;
      applyfft( fft, data, datai, false );

      //find the peak
      float maxVal = -1;
      int maxIndex = -1;
      for( int j=0; j<FFT_SIZE; ++j ) {
         float v = data[j] * data[j] + datai[j] * datai[j] ;
         if( v > maxVal ) {
            maxVal = v;
            maxIndex = j;
         }
      }
      float freq = freqTable[maxIndex];
      //...
  }

Let's go over each of the steps and see how they work.

Audio Data

We always need to start with a sequence of numbers representing the amplitude of audio over time (sometimes called "Linear, PCM audio"). This is what we get from most uncompressed audio formats like AIFF and WAV. Its also what you get from audio APIs like ASIO, CoreAudio and ALSA. In this case, we are using PortAudio, which acts like a portable wrapper around these and other APIs. If you have a compressed format such as MP3 or OGG, you will have to convert it to uncompressed audio first.

Your data might be 16-bit integer, 8-bit integer, 32-bit floating point or any number of other formats. We'll assume you know how to get your data to floating point representation in the range from -1 to 1. PortAudio takes care of this for us when we specify these input parameters:


inputParameters.device = Pa_GetDefaultInputDevice();
   inputParameters.channelCount = 1;
   inputParameters.sampleFormat = paFloat32;
   inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )->defaultHighInputLatency ;
   inputParameters.hostApiSpecificStreamInfo = NULL;


You'll also need to know how often your audio is sampled. For a tuner, less is more, so we'll use a sample rate of 8 kHz, which is available on most hardware. This is extremely low for most audio applications (44.1 kHz is considered standard for audio and 48 kHz is standard for video), but for a tuner, 8 kHhz is plenty.

#define SAMPLE_RATE (8000)

Low-Pass Filtering

There's no hard and fast rule about low-pass filtering (or simply "low-passing") your audio data. In fact, it's not even strictly necessary, but doing so can get rid of unwanted noise and the higher frequencies that sometimes masquerade as the fundamental frequency. This is important because some instruments have component frequencies called harmonics that are more powerful than the "fundamental" frequencies, and usually we are interested in the fundamental frequencies. Filtering, therefore, can improve the reliability of the rest of the pitch tracker significantly. Without filtering, some noise might appear to be the dominant pitch, or, more likely, the dominant pitch might appear to be a harmonic of the actual fundamental frequency.

A good choice for the filter is a low-pass filter with a center frequency around or a little above the highest pitch you expect to detect. For a guitar tuner, this might be the high E string, or about 330 Hz. So that's what we'll use -- in fact, we low-pass it twice. If you are modifying the code for another purpose, you can set the center frequency to something that makes sense for your application.

If you aren't sure or you want to go with or want something less agressive, you could try a moving average filter, which simply outputs the average of the current input and some number of previous inputs. Intuitively, we can understand that this filter reduces high frequencies because signals that change quickly get "smoothed" out.

// Process every sample of your input with this function
// (this is not used in our guitar tuner)
function float twoPointMovingAverageFilter( float input ) {
   static float lastInput = 0;
   float output = ( input + lastInput ) / 2 ;
   lastInput = input;
   return output;
}

The moving average filter won't make a huge difference, but if the low pass filter I used in my code doesn't suit you and you don't have the degree in electrical engineering required to design the right digital filter (or don't know what the right filter is), it might be better than nothing. I haven't tested the moving average filter myself.

Windowing

Generally speaking, FFTs work in chunks of data, but your input is a long or even continuous stream. To fit this round peg into this square hole, you need to break off chunks of your input, and process the chunks. However, doing so without proper treatment may prove detrimental to your results. In rough terms, the problem is that the edges get lopped off very sloppily, creating artifacts at frequencies that aren't actually present in your signal. These artifacts, called "sidelobes", cause problems for many applications. I know that some tuners are designed without special treatment, so you can skip this step, but I strongly recommend you keep reading because it's easy to deal with this problem.

To reduce the sidelobes, we premultiply each chunk of audio with another signal called a window, or window function. Two simple and popular choices for window functions are the Hamming window, and the Hann window. I put code for both in the tuner, but I used the Hann window.

void buildHanWindow( float *window, int size )
{
   for( int i = 0 to size )
      window[i] = .5 * ( 1 - cos( 2 * M_PI * i / (size-1.0) ) );
}
void applyWindow( float *window, float *data, int size )
{
   for( int i = 0 to size )
      data[i] *= window[i] ;
}

For a tuning app, the windows may overlap, or there may be gaps in between them, depending on your needs and your available processing power. For example, by overlapping and performing more FFTs, and then averaging the results, you may get more accurate results more quickly, at the cost of more CPU time. I strongly recommend doing this in real apps. I did not do this in my app to make the code easier to follow, and you'll see that the values sometimes jump around and don't respond smoothly.

FFT

The FFT, or Fast Fourier Transform, is an algorithm for quickly computing the frequencies that comprise a given signal. By quickly, we mean O( N log N ). This is way faster than the O( N2) which how long the Fourier transform took before the "fast" algorithm was worked out, but still not linear, so you are going to have to be mindful of performance when you use it. Because the FFT is now the standard way to compute the Fourier transform, many people often use the terms interchangeably, even though this is not strictly correct.

The FFT works on a chunk of samples at a time. You don't get more or less data out of a Fourier Transform than you put into it, you just get it in another form. That means that if you put ten audio samples in you get ten data-points out. The difference is that these ten data points now represent energy at different frequencies instead of energy at different times, and since our data uses real numbers, and not complex, the FFT will contain some redundancies -- specifically, only the first half of the spectrum contains relevant data. That means that for ten samples in, we really only get five relevant data-points out.

Clearly, the more frequency resolution you need, the more time data you need to give it. However, at some point you will run into the problem of not being able to return results quickly enough, either because you are waiting for more input, or because it takes too long to process. Choosing the right size FFT is critical: too big and you consume lots of CPU and delay getting a response, too small and your results lack resolution.

How do we know how big our FFT should be? You can determine the accuracy of your FFT with this simple formula:

binSize = sampleRate/N ;

For example, with a bin size of 8192 (most implementations of the FFT work best with powers of 2), and a sample rate of 44100, you can expect to get results that are accurate to within about 5.38 Hz. Not great for a tuner, but, hey, that's why we are sampling at 8000 Hz, which gives us an accuracy of better than 1 Hz. Still not perfect, for, say, a 5 string bass, but you can always use a a larger N if you need to. Keep in mind that getting enough samples to get that much accuracy takes longer than a second, so our display only updates about once a second. That's yet another reason you might want to overlap your windows.

The output of the FFT is an array of N complex numbers. It is possible to use both the real and imaginary part to get very accurate frequency information, but for now we'll settle for something simpler and much easier to understand: we simply look at the magnitude. To find the magnitude of each frequency component, we use the distance formula:

for( i in 0 to N/2 )
   magnitude[i] = sqrt( real[i]*real[i] + cmpx[i]*cmpx[i] );

Now that we know the magnitude of each FFT bin, finding the frequency is simply a matter of finding the bin with the maximum magnitude. The frequency will then be the bin number times the bin size, which we computed earlier. Note that we don't actually need to compute the square root to find the maximum magnitude, so our actual code skips that step.

More

We do a bit more in our code, like identify the nearest semi-tone and find the difference between the that semi-tone and the identified frequency, but for stuff like that we'll leave the code to speak for itself.