One of the things I complained about in my last post was would-be company founders who are looking for nerds to build their product based on a specific idea or vision they've already created. Obviously, there's nothing wrong with hiring someone to build a product the way you want -- I build products for people all the time based on what they had in mind (and, I should add, I find this a highly rewarding experience). So what's the problem?
The problem happens not when you hire someone to build something a for you, but when you view your cofounder and/or CTO as someone who is building something for you. Your technical cofounder is not a contractor or regular employee; rather they are part of the core team responsible for the direction of the company. As such, they need to be invested in the company strategy (both literally, with stock, and emotionally) and their role is to help determine the company strategy and define the company's product.
You may think you've already defined the product, you just need someone to build it. If that's really the case, then get yourself a contractor -- ain't no shame in that! Unless your just doing something like building a simple e-commerce website as the storefront for your existing business, though, it's probably not that simple. Building new technology involves risks and tradeoffs, and you need someone who understands those risks and tradeoffs and how those risks and tradeoffs will impact the company strategy. That's where the CTO comes in.
Ideally, of course, all your employees should feel empowered to have some say over their domain area, but when founding a company, it's especially important to have a CTO who is invested. The reason is simple: most engineers are happy to build a product to spec, but a CTO who is invested will make sure the spec is both technically viable and makes sense for the business. Your CTO is, therefore, not just another employee: they are your partner in nurturing your product in all stages of development from conception to completion. This may seem counterintuitive if you already have a great, multi-million dollar idea, wireframes and so on, but it so happens that unless a technical person has looked it over and vetted it thoroughly (and probably changed it a lot), your idea is far from the great, multimillion dollar idea you thought it was. Worst case: it's not even viable. (Obviously, proper vetting from a technical person is not enough to make it a great, multimillion dollar idea, either, but that's another conversation). To put this another way, having your CTO invested in your product vision from the beginning removes significant risk from your business model because they can anticipate problems and see better solutions.
When I do contract work, there is always a technical person as part of the team that hired me. Whether that person is actually a CTO or not, they understand the requirements and know how my work is going to fit in with the business model, and they understand what tradeoffs are being made when it comes time to make a major decision. Without a person like that, who can make decisions with your business' best interests in mind, you will not be able to make the right technical decisions for your business.
In short, the job of a contractor is to build a product to spec on time and on budget. The job of the technical co-founder is to make sure the that technical decisions are made with business' best interests in mind. Both are useful people, but don't confuse the two.
bjorg
Sunday, April 28, 2013
Friday, April 26, 2013
Why you haven't found a tech co-founder
As a nerd who has accomplished a thing or two in a field that many people think it would be sexy to "disrupt" (music technology), I have been asked by many people in the last few years to co-found companies with them. These offers range from serious and interesting to what I call "drive-by's": people with some idea or other that are looking for an engineer to build. So far, I have said no to everyone and am continuing on my own projects which are far more interesting and rewarding.
If you are looking for a tech co-founder, I am going to tell you why you've had a hard time finding people like me; why we've probably said no to you, and what you can do to change the situation. As is always the case in business when you are stuck, you need to understand the situation from the other person's perspective in order to get what you want. Our perspective may not be right, and I am going to say some cold, harsh stuff, but I think you will feel enlightened if stand in our shoes for a minute. For starters, I'm assuming you aren't making an amateur mistake.
If you are looking for a tech co-founder, I am going to tell you why you've had a hard time finding people like me; why we've probably said no to you, and what you can do to change the situation. As is always the case in business when you are stuck, you need to understand the situation from the other person's perspective in order to get what you want. Our perspective may not be right, and I am going to say some cold, harsh stuff, but I think you will feel enlightened if stand in our shoes for a minute. For starters, I'm assuming you aren't making an amateur mistake.
Why nerds don't go looking for "non-technical" co-founders
It may or may not surprise you to discover that when nerds (or technical founders, or future-CTOs or whatever you want to call us) decide to start a company, we don't go looking for a "business" expert. Startup business just isn't that complex. It is time-consuming, but not complex. What is complex, usually, is the technology: you need to build something reliable, scalable, easy to use and awesome. Believe me when I say that's fucking hard. And you usually need to do that for way less money than a big company would spend. That's the kind of shit alpha-nerds live for. Rightly or wrongly, we don't go looking for someone with MBA to help us build a startup, especially if the people we meet with MBAs have less of clue about the market than we do (which is the case surprisingly often). I'm not saying there's nothing you can offer. What I'm saying is that if you do have something to offer, you need to prove it.You might have nothing to offer
You might think you have a lot to offer, but nerds don't see it that way: we don't need non-nerds to found a company, start building product and get our first users. I know that the classic company founding team is two people: one "builder" and one "seller", presumably the nerd and the business expert. But if you look at some of the best tech companies, the "seller" was often actually a technical person who learned to sell and learned the basics of running a business. The classic tech company founding team is more accurately described as a super-nerd (the builder) and a regular nerd (the seller). But I know, you've still got something we don't: a brilliant idea! There's two problems with that: 1. we also have great ideas, and 2....
Your brilliant idea is worthless
I know it sounds cold when I say your idea is worthless, but look at the companies that have come out of the last few decades of startups. They have either been simple ideas (e-bay, amazon, etc), copycats (facebook), or the ideas have been technical (google, heroku, etc), and let's face it, only a nerd could come up with a technical idea. Moreover, unlike you, nerds know what ideas can and can't be built. Don't get me wrong, ideas are not easy, and they are great starting points, but startups don't win on ideas: they win on execution. The value of the idea is that if everyone is invested in it and understands it, the team can work together cohesively. There's no better way to sour your team's enthusiasm for a project than to have "them" build "your" idea, and there's no better way to turn off a potential tech co-founder than to say "I have a great idea for a company, I just need someone to build it" (yes, I've heard this. A lot.). Sometimes companies do get started this way, but it's a sure sign of an unhealthy company that will be lucky to continue to be functional. If you want to find a tech cofounder, develop a relationship with them first, and then build the idea with them. One other note about ideas is that they often change in the face of your first product release and its response from the market, so your super-brilliant idea is probably going to change anyway. (If that's news to you, read up on the lean startup methodology).
What can you do about it?
If you aren't friends with any nerds who can build a business with you, you are not out of luck. Some common suggestions I've read are:
- Learn to code yourself. This might work if you are young and ambitious. It will certainly have the benefit of helping you speak the language, understanding technical problems, meeting other nerds and so on. I would certainly applaud the effort of learning to code, but let's face facts: you can't become your own technical cofounder unless your technical needs are very simple or you are willing to put serious time into it. There's more to being a CTO than "coding" and you are unlikely to get a deep understanding for technology by taking a crash course in Ruby.
- Outsource for your prototype. This actually works sometimes, but it will cost you and it's not as easy as it sounds. Outsourcing (whether on- or off- shore) requires that someone on your team has a clue about what's going on technically. You really need to be in touch with the technical team every day. Make sure you understand their technologies and, more importantly, their methodologies. If that sounds excessive, talk to anybody who's tried outsourcing. Keep in mind that even if you go with this strategy, you'll usually need to find a CTO and have them prove themselves before anyone will give you significant funding, which means you are going to need a plan to pay for this before you even approach a VC.
- Build your connections. Trust me: nerds will look at your linkedin profile. If we don't see hundreds of contacts (including at least as many VCs, angles, and potential customers as we have) we won't keep talking to you. Why should we?
- Accomplish something. What better way to prove to nerds that you are worth working with than to have done something successful in the past? If your resume is full of school and jobs rather than accomplishments, then you need to work on your resume. Try joining an early-stage startup, or helping a startup raise money. If you can do that, you will have awesome nerds knocking down your door. If you can't do that, at least try writing a successful blog, or running a meetup.
The bottom line
The bottom line here is that startups don't need average business people. They don't need people who specialize in ideas or "solving business problems." Startups need doers, and they need doers who work together. If you can't prove that you are a doer, then other doers will not be interested. Think of it this way: would an investor give you money? If not, then why should a highly accomplished nerd give you their time?Tuesday, November 27, 2012
Audio IIR v FIR EQs
Digital filters come in two flavors: IIR (or "Infinite Impulse Response") and FIR (or "Finite Impulse Response"). Those complex acronyms may confuse you, so let's shed a little light on the situation by defining both and explaining the differences.
Some people are interested in which is better. Unfortunately, as with many things, there is no easy answer to that question, other than "it depends", and sometimes what it depends on is your ears. I won't stray too deep into field of opinions, but I will try to mention why some people claim one is better than the other and what some of the advantages and disadvantages are in different situations.
How Filters Work
When you design a filter, you start with a set of specifications. To audio engineers, this might be a bit vague, like "boost 1 kHz by 3 dB", but electrical engineers are usually trained to design filters with very specific constraints. However you start, there's usually some long set of equations, and rules used to "design" the filter, depending on what type of filter you are designing and what the specific constraints are (to see one way you might design a filter, see this post on audio eq design). Once the filter is "designed" you can actually process audio samples.IIR Filters
Once the filter is designed, the filter itself is implemented as difference equations, like this:y[i] = a0 * x[i] + a1 * x[i-1] ... + an * x[i-n] - b1 * y[i-1] ... - bm * y[i-m].
In this case, y is an array storing the output, and x is an array storing the input. Note that each output is a linear function of previous inputs and outputs, as well as the current input.
In order to know the current value of y, we need to know the last value of y, and to know that, you must know the value of still earlier values of y, and so on, all the way back until we reach our initial conditions. For this reason, this kind of filter is sometimes called a "recursive" filter. In principle, this filter can be given a finite input, and it will produce output forever. Because its response is infinite, we call this filter an IIR, or "Infinite Impulse Response" filter.
(To further confuse the terminology, IIR filters are often designed with certain constraints that make them "minimum phase." While IIR filters are not all minimum phase, many people use the terms "recursive", "IIR" and "minimum phase" interchangeably.)
Digital IIR filters are often modeled after analog filters. In many ways, analog-modled IIR filters sound like analog filters. They are very efficient, too: for audio purposes, they usually only require a few multiplies.
FIR Filters
FIR filters, on the other hand, are usually implemented with a difference equation that looks like this:y[i] = a0 * x[i] + a1 * x[i-1] a2 * x[i-2] + ... an * x[i-n] + an * x[i-n-1] + ... + a1 * x[2i+1] + a0 * x[2i]
In this case, we don't use previous outputs: in order to calculate the current output, we only need to know the previous n inputs. This may improve the numerical stability of the filter because roundoff errors are not accumulated inside the filter. However, generally speaking, FIR filters are much more CPU intensive for a comparable response, and have some other problems, such as high latency, and both pass-band and stop-band ripple.
If an FIR filter can be implemented using a difference equation that is symmetrical, like the one above, it has a special property called "linear phase." Linear phase filters delay all frequencies in the signal by the same amount, which is not possible with IIR filters.
Which Filter?
When deciding which filter to use, there are many things to take into account. Here are some of those things:
- Some people feel that linear phase FIR filters sound more natural and have fewer "artifacts".
- FIR filters are usually much more processor intensive for the same response.
- FIR filters have "ripple" in both the passband and stopband, meaning the response is "jumpy". IIR filters can be designed without any ripple.
- IIR filters can be easily designed to sound like analog filters.
- IIR filters require careful design to ensure stability and good numerical error properties, however, that art is fairly advanced.
- FIR filters generally have a higher latency.
Saturday, September 8, 2012
Compiling libjingle on OS X
I recently spent the day (yes, the entire day) compiling libjingle on OS X. I'm still running OS X 10.6.8, so that may have been part of the problem, but there are clearly some deeper issues. I thought I'd document the changes I had to make to the compilations instructions in case anyone else (like me in the future) has to go through this nightmare.
First off, the package includes compilation instructions in the README file. This file has some organizational issues (For example, the dependencies expat and srtp are not listed under the "prerequisites" section, but rather the "libjingle" section) and does not account for some bugs I found, but otherwise includes some pretty good detail. Unfortunately, all the "examples" they give are for windows, so I imagine that's where all the development and testing is done. Still, you need to read it. This post is just an outline and only goes into detail where the README doesn't explain things.
Also, there's no longer an active mailing list to go to ask questions, which is sad because that would be a good place to bring these issues up (there are already bugs posted for most of the fixes). It also makes me think maybe libjingle is dead or on critical life-support. (the mailing list linked from the developer's page is currently non-existant, and the link from their blog to the "google talk help center" goes to archive.org!) If you need help, your best bet is probably stackoverflow.com, which is a great place to go for help, but it's no substitute for a mailing list.
======= Makefile ========
SCONS_DIR ?= /usr/local/Cellar/scons/2.2.0/libexec/scons-local/
export
default: build
talk/third_party/expat-2.0.1/Makefile:
cd talk/third_party/expat-2.0.1 && ./configure
talk/third_party/srtp/Makefile:
cd talk/third_party/srtp && ./configure
build: talk/third_party/expat-2.0.1/Makefile talk/third_party/srtp/Makefile
cd talk && third_party/swtoolkit/hammer.sh
verbose: talk/third_party/expat-2.0.1/Makefile talk/third_party/srtp/Makefile
cd talk && third_party/swtoolkit/hammer.sh --verbose
help:
~/bin/swtoolkit/hammer.sh --help
clean:
cd talk && third_party/swtoolkit/hammer.sh --clean
UPDATE: notes on 64-bit build.
First off, the package includes compilation instructions in the README file. This file has some organizational issues (For example, the dependencies expat and srtp are not listed under the "prerequisites" section, but rather the "libjingle" section) and does not account for some bugs I found, but otherwise includes some pretty good detail. Unfortunately, all the "examples" they give are for windows, so I imagine that's where all the development and testing is done. Still, you need to read it. This post is just an outline and only goes into detail where the README doesn't explain things.
Also, there's no longer an active mailing list to go to ask questions, which is sad because that would be a good place to bring these issues up (there are already bugs posted for most of the fixes). It also makes me think maybe libjingle is dead or on critical life-support. (the mailing list linked from the developer's page is currently non-existant, and the link from their blog to the "google talk help center" goes to archive.org!) If you need help, your best bet is probably stackoverflow.com, which is a great place to go for help, but it's no substitute for a mailing list.
Compiling libjingle
- Download and extract libjingle from the google code page. I used 0.6.14 for this.
- Be sure to extract it somewhere without any weird characters in the path (including spaces) or the build will barf.
- Create a makefile (below) at the top level of libjingle. This will be especially useful in case you need to run the build over and over again as you tweak things.
- Install the prerequisites (see the README for more details)
- Python should already be installed
- To install scons, I recommend homebrew: $ brew install scons
- download swtoolkit and extract it as talk/third_party/swtoolkit
- download gtest. extract it as talk/third_party/gtest
- download expat 2.0.1. extract as talk/third_party/expat-2.0.1
- download srtp and extract as talk/third_party/srtp
- Apply the following fixes:
- Fix talk/third_party/swtoolkit/site_scons/site_init.py as described here and here.
- Fix talk/libjingle.scons as described here.
- Make the following two changes to talk/main.scons:
- comment out the line that has '-fno-rtti' in it (if you are running a newer version of OS X, and up-to-date dev tools, you may not need to do this.)
- Apply the fix described here. A logical place to add the mac_env.Replace(...) is after mac_env.Append( … ).
- Holy crap! You did it! It should now build with $ make
- If you get stuck, you may get a hint from $ make verbose
- To compile 64-bit binaries, you need to do a few more things:
- Comment out "session/phone/carbonvideorenderer.cc" from libjingle.scons, and 'Carbon' from main.scons.
- Change '-arch', 'i386', to '-arch', 'x86_64', in two places in main.scons
- Though the build will terminate with errors, you should at least have the .a files you need.
======= Makefile ========
SCONS_DIR ?= /usr/local/Cellar/scons/2.2.0/libexec/scons-local/
export
default: build
talk/third_party/expat-2.0.1/Makefile:
cd talk/third_party/expat-2.0.1 && ./configure
talk/third_party/srtp/Makefile:
cd talk/third_party/srtp && ./configure
build: talk/third_party/expat-2.0.1/Makefile talk/third_party/srtp/Makefile
cd talk && third_party/swtoolkit/hammer.sh
verbose: talk/third_party/expat-2.0.1/Makefile talk/third_party/srtp/Makefile
cd talk && third_party/swtoolkit/hammer.sh --verbose
help:
~/bin/swtoolkit/hammer.sh --help
clean:
cd talk && third_party/swtoolkit/hammer.sh --clean
UPDATE: notes on 64-bit build.
Thursday, August 23, 2012
Basic Audio EQs
In my last post, I looked at why it's usually better to do EQ (or filtering) in the time domain than the frequency domain as far as audio is concerned, but I didn't spend much time explaining how you might implement a time-domain EQ. That's what I'm going to do now.
The theory behind time-domain filters could fill a book. Instead of trying to cram you full of theory we'll just skip ahead to what you need to know to do it. I'll assume you already have some idea of what a filter is.
Notice I said series, though. Don't try putting these filters in parallel, because they not only alter the frequency response, but also the phase response, so when you put them in parallel you might get unexpected results. For example, if you take a so-called all-pass filter and put it in parallel with no filter, the result will not be a flat frequency response, even though you've combined the output of two signals that have the same frequency response as the original signal.
Using the Audio EQ Cookbook, we can design a peaking, high-pass, low-pass, band-pass, notch (or band-stop), or shelving filter. These are the basic filters used in audio. We can even design that crazy all-pass filter I mentioned which actually does come in handy if you are building a phaser. (It has other uses, too, but that's for another post.)
Fs = Sample Rate
F0 = Center Frequency (always less than Fs/2)
BW = Bandwidth in octaves
g = gain in dB
Great! Now we are ready to begin our calculations. First, RJB suggests calculating some intermediate values:
A = 10^(g/40)
w0 = 2*pi*f0/Fs c = cos(w0) s = sin(w0) alpha = s*sinh( ln(2)/2 * BW * w0/s )
This is a great chance to use that hyperbolic sin button on your scientific calculator that, until now, has only been collecting dust. Now that we've done that, we can finally calculate the filter coefficients, which we use when actually processing data:
b0 = 1 + alpha*A b1 = -2*c b2 = 1 - alpha*A a0 = 1 + alpha/A a1 = -2*c a2 = 1 - alpha/A
Generally speaking, we want to "normalize" these coefficients, so that a0 = 1. We can do this by dividing each coefficient by a0. Do this in advance or the electrical engineers will laugh at you:
b0 /= a0 b1 /= a0 b2 /= a0 a1 /= a0 a2 /= a0
Now, in pseudocode, here's how we process our data, one sample at a time using a "process" function that looks something like this:
number xmem1, xmem2, ymem1, ymem2;
void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
number process( number x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;
xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;
return y;
}
You'll probably have some kind of loop that your process function goes in, since it will get called once for each audio sample.
There's actually more than one way to implement the process function given that particular set of coefficients. This implementation is called "Direct Form I" and happens to work pretty darn well most of the time. "Direct form II" has some admirers, but those people are either suffering from graduate-school-induced trauma or actually have some very good reason for doing what they are doing that in all likelihood does not apply to you. There are of course other implementations, but DFI is a good place to start.
You may have noticed that the output of the filter, y, is stored and used as an input to future iterations. The filter is therefore "recursive". This has several implications:
#DEFINE IS_DENORMAL(f) (((*(unsigned int *)&(f))&0x7f800000) == 0)
float xmem1, xmem2, ymem1, ymem2;
void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
float process( float x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;
if( IS_DENORMAL( y ) )
y = 0;
xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;
return y;
}
Okay, happy filtering!
The theory behind time-domain filters could fill a book. Instead of trying to cram you full of theory we'll just skip ahead to what you need to know to do it. I'll assume you already have some idea of what a filter is.
Audio EQ Cookbook
The Audio EQ Cookbook by Robert Bristow-Johnson is a great, albeit very terse, description of how to build basic audio EQs. These EQs can be described as second order digital filters, sometimes called "biquads"because the equation that describes them contains two quadratics. In audio, we sometimes use other kinds of filters, but second order filters are a real workhorse. First order filters don't do much: they generally just allow us to adjust the overall balance of high and low frequencies. This can be useful in "tone control" circuits, like you might find on some stereos and guitars, but not much else. Second order filters give us more control -- we can "dial in" a specific frequency, or increase or decrease frequencies above and below a certain threshold, with a fair degree of accuracy, for example. If we need even more control than a second order filter offers, we can often simply take several second order filters and place them in series to simulate the effect of a single higher order filter.Notice I said series, though. Don't try putting these filters in parallel, because they not only alter the frequency response, but also the phase response, so when you put them in parallel you might get unexpected results. For example, if you take a so-called all-pass filter and put it in parallel with no filter, the result will not be a flat frequency response, even though you've combined the output of two signals that have the same frequency response as the original signal.
Using the Audio EQ Cookbook, we can design a peaking, high-pass, low-pass, band-pass, notch (or band-stop), or shelving filter. These are the basic filters used in audio. We can even design that crazy all-pass filter I mentioned which actually does come in handy if you are building a phaser. (It has other uses, too, but that's for another post.)
Bell Filter
Let's design a "bell", or "peaking" filter using RBJ's cookbook. Most other filters in the cookbook are either similar to the bell or simpler, so once you understand the bell, you're golden. To start with, you will need to know the sample rate of the audio going into and coming out of your filter, and the center frequency of your filter. The center frequency, in the case of the bell filter, is the frequency that "most affected" by your filter. You will also want to define the width of the filter, which can be done in a number of ways usually with some variation on "Q" or "quality factor" and "bandwidth". RBJ's filters define bandwidth in octaves, and you want to be careful that you don't extend the top of the bandwidth above the Niquist frequency (or 1/2 the sample rate), or your filter won't work. We also need to know how much of our center frequency to add in dB (if we want to remove, we just use a negative value, and for no change, we set that to 0).Fs = Sample Rate
F0 = Center Frequency (always less than Fs/2)
BW = Bandwidth in octaves
g = gain in dB
Great! Now we are ready to begin our calculations. First, RJB suggests calculating some intermediate values:
A = 10^(g/40)
w0 = 2*pi*f0/Fs c = cos(w0) s = sin(w0) alpha = s*sinh( ln(2)/2 * BW * w0/s )
This is a great chance to use that hyperbolic sin button on your scientific calculator that, until now, has only been collecting dust. Now that we've done that, we can finally calculate the filter coefficients, which we use when actually processing data:
b0 = 1 + alpha*A b1 = -2*c b2 = 1 - alpha*A a0 = 1 + alpha/A a1 = -2*c a2 = 1 - alpha/A
Generally speaking, we want to "normalize" these coefficients, so that a0 = 1. We can do this by dividing each coefficient by a0. Do this in advance or the electrical engineers will laugh at you:
b0 /= a0 b1 /= a0 b2 /= a0 a1 /= a0 a2 /= a0
Now, in pseudocode, here's how we process our data, one sample at a time using a "process" function that looks something like this:
number xmem1, xmem2, ymem1, ymem2;
void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
number process( number x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;
xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;
return y;
}
You'll probably have some kind of loop that your process function goes in, since it will get called once for each audio sample.
There's actually more than one way to implement the process function given that particular set of coefficients. This implementation is called "Direct Form I" and happens to work pretty darn well most of the time. "Direct form II" has some admirers, but those people are either suffering from graduate-school-induced trauma or actually have some very good reason for doing what they are doing that in all likelihood does not apply to you. There are of course other implementations, but DFI is a good place to start.
You may have noticed that the output of the filter, y, is stored and used as an input to future iterations. The filter is therefore "recursive". This has several implications:
- The filter is fairly sensitive to errors in the recursive values and coefficients. Because of this, we need to take care of what happens with the error in our y values. In practice, on computers, we usually just need to use a high resolution floating point value (ie double precision) to store these (on fixed point hardware, it is often another matter).
- Another issue is that you can't just blindly set the values of your coefficients, or your filter may become unstable. Fortunately, the coefficients that come out of RJB's equations always result in stable filters, but don't go messing around. For example, you might be tempted to interpolate coefficients from one set of values to another to simulate a filter sweep. Resist this temptation or you will unleash the numerical fury of hell! The values in between will be "unstable" meaning that your output will run off to infinity. Madness, delirium, vomiting and broken speakers are often the unfortunate casualties.
- On some platforms you will have to deal with something called "denormal" numbers. This is a major pain in the ass, I'm sorry to say. Basically it means our performance will be between 10 and 100 times worse than it should be because the CPU is busy calculating tiny numbers you don't care about. This is one of the rare cases where I would advocate optimizing before you measure a problem because sometimes your code moves around and it comes up and it's very hard to trace this issue. In this case, the easiest solution is probably to do something like this (imagine we are in C for a moment):
#DEFINE IS_DENORMAL(f) (((*(unsigned int *)&(f))&0x7f800000) == 0)
float xmem1, xmem2, ymem1, ymem2;
void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
float process( float x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;
if( IS_DENORMAL( y ) )
y = 0;
xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;
return y;
}
Okay, happy filtering!
Wednesday, August 8, 2012
Why EQ Is Done In the Time Domain
In my last post, I discussed how various audio processing may be best done in the frequency or time domain. Specifically, I suggested that EQ, which is a filter that alters the frequency balance of a signal, is best done in the time domain, not the frequency domain. (See my next post if you want to learn how to implement a time-domain filter.)
If this seems counter intuitive to you, rest assured you are not alone. I've been following the "audio" and "FFT" tags (among others) on stack overflow and it's clear that many people attempt to implement EQs in the frequency domain, only to find that they run into a variety of problems.
Our first attempt at a low-pass filter, implemented with the FFT might look something like this:
But there are quite a few problems with that approach:
As a side note, one case where it might be worth all that work is a special case of so-called FIR filters (also sometimes called "Linear phase" filters). These are used sometimes in audio production and in other cases. In audio, they are usually used only in mastering because of their high latency and computational cost, but even then, many engineers don't like them (while others swear by them). FIR filters are best implemented in the time domain, as well, until the number of "taps"in the filter becomes enormous, which it sometimes does, and it actually becomes more efficient to implement using an FFT with OLA. FIR filters suffer from many of the problems mentioned above including pre-echo, high computational cost and latency, but they do have some acoustical properties that make them desirable in some applications.
y(n) = { x(n) + x(n-1) + .... + x(n-M) } / (M+1)
where x(i) is your input sample at time i, and y(i) is your output sample at time i. No FFT required for that (This is not the best filter for removing high frequencies -- in fact we can do WAY better -- but it is my favorite way to illustrate the point. The moving average filter is not uncommon in economics, image processing and other fields partly for this reason.). Several advantages are immediately obvious, and some are not so obvious:
In the end, the general rule is that for a given performance, you can get much better results with the time-domain than the frequency domain.
If this seems counter intuitive to you, rest assured you are not alone. I've been following the "audio" and "FFT" tags (among others) on stack overflow and it's clear that many people attempt to implement EQs in the frequency domain, only to find that they run into a variety of problems.
Frequency Domain Filters
Let's say you want to eliminate or reduce high frequencies from your signal. This is called a "low-pass" filter, or, less commonly, a "high-cut" filter. In the frequency domain, high frequencies get "sorted" into designated "bins", where you can manipulate them or even set them to zero. This seems like an ideal way to do low-pass filtering, but lets explore the process to see why it might not work out so well.Our first attempt at a low-pass filter, implemented with the FFT might look something like this:
- loop on audio input
- if enough audio is received, perform FFT, which gives us audio in the frequency domain
- in frequency domain, perform manipulations we want. In the case of eliminating high frequencies, we set the bins representing high frequencies to 0.
- perform inverse FFT, to get audio back in time domain
- output that chunk of audio
But there are quite a few problems with that approach:
- We must wait for a chunk of audio before we can even begin processing, which means that we will incur latency in our processing. The higher quality filter we want, the more audio we need to wait for. If the input buffer size does not match the FFT size, extra buffering needs to be done.
- The FFT, though efficient compared to the DFT (which is the FFT without the "fast" part), performs worse than linear time, and we need to do both the FFT and it's inverse, which is computationally similar. EQing with the FFT is therefore generally very inefficient compared to comparable time-domain filters.
- Because our output chunk has been processed in the frequency domain independent of samples in neighboring chunks, the audio in neighboring chunks may not be continuous. One solution is to process the entire file as one chunk (which only works for offline, rather than real-time processing, and is computationally expensive). The better solution is the OLA or Overlap Add method but this involves complexity that many people miss when implementing a filter this way.
- Filters implemented via FFT, as well as time-domain filters implemented via IFFT, often do not perform the way people expect. For example, many people expect that if they set all values in bins above a certain frequency to 0, then all frequencies above the given frequency will be eliminated. This is not the case. Instead, frequency responses at the bin values will be 0, but the frequency response between those values is free to fluctuate -- and it does fluctuate, often greatly. This fluctuation is called "ripple." There are techniques for reducing ripple but they are complex, and they don't eliminate ripple. Note that, in general, frequencies across the entire spectrum are subject to ripple, so even just manipulating a small frequency band many create ripple across the entire frequency spectrum.
- FFT filters suffer from so-called "pre-echo", where the sounds can be heard before the main sound hits. In and of itself, this isn't really a problem, but sounds are "smeared" so badly by many designs, that many in the audio world feel that these filters can effect the impact of transients and stereo imaging if not implemented and used correctly.
As a side note, one case where it might be worth all that work is a special case of so-called FIR filters (also sometimes called "Linear phase" filters). These are used sometimes in audio production and in other cases. In audio, they are usually used only in mastering because of their high latency and computational cost, but even then, many engineers don't like them (while others swear by them). FIR filters are best implemented in the time domain, as well, until the number of "taps"in the filter becomes enormous, which it sometimes does, and it actually becomes more efficient to implement using an FFT with OLA. FIR filters suffer from many of the problems mentioned above including pre-echo, high computational cost and latency, but they do have some acoustical properties that make them desirable in some applications.
Time Domain Filters
Let's try removing high frequencies in the time domain instead. In the time domain, high frequencies are represented by the parts of the signal that change quickly, and low frequencies are represented as the parts that change slowly. One simple way to remove high frequencies, then, would be to use a moving average filter:y(n) = { x(n) + x(n-1) + .... + x(n-M) } / (M+1)
where x(i) is your input sample at time i, and y(i) is your output sample at time i. No FFT required for that (This is not the best filter for removing high frequencies -- in fact we can do WAY better -- but it is my favorite way to illustrate the point. The moving average filter is not uncommon in economics, image processing and other fields partly for this reason.). Several advantages are immediately obvious, and some are not so obvious:
- Each input sample can be processed one at a time to produce one output sample without having to chunk or wait for more audio. Therefore, there are also no continuity issues and minimal latency.
- It is extremely efficient, with only a few multiplies, adds and memory stores/retrievals required per sample.
- These filters can be designed to closely mimic analog filters.
In the end, the general rule is that for a given performance, you can get much better results with the time-domain than the frequency domain.
Saturday, August 4, 2012
When to (not) use the FFT
In the last post I discussed one use for the FFT: pitch tracking. I also mentioned that there were better ways to do pitch tracking. Indeed, aside from improvements on that method, you could also use entirely different methods that don't rely on the FFT at all.
The FFT transforms data into the "frequency domain", or, if your data is broken down into chunks, the FFT transforms it into the "time-frequency domain," which we often still think of as the frequency domain. However, the most basic "domain" you can work in is usually the "time domain." In the time domain, audio is represented as sequence of amplitude values. You may know this as "PCM" audio. This is what's usually stored in WAVs and AIFs, and when we access audio devices like soundcards, this is the most natural way to transfer data. It turns out we can also do a whole lot of processing and analysis in the time domain as well.
Wow, so impulse reverb is really the only thing on that list you need an FFT for? Actually even that can be done in the time domain, it's just much more efficient in the frequency domain (so much so that it might be considered impossible in the time domain).
You might wonder how to adjust the frequency balance of a signal, which is what an EQ does, in the time domain rather than the frequency domain. Well, you can do it in the frequency domain, but you are asking for trouble. I'll talk about this in my next post.
The FFT transforms data into the "frequency domain", or, if your data is broken down into chunks, the FFT transforms it into the "time-frequency domain," which we often still think of as the frequency domain. However, the most basic "domain" you can work in is usually the "time domain." In the time domain, audio is represented as sequence of amplitude values. You may know this as "PCM" audio. This is what's usually stored in WAVs and AIFs, and when we access audio devices like soundcards, this is the most natural way to transfer data. It turns out we can also do a whole lot of processing and analysis in the time domain as well.
| Process | Time Domain | Frequency Domain |
|---|---|---|
| Filtering/ EQ | Yes! | No! |
| Pitch Shifting | Okay | Okay |
| Pitch Tracking | Okay | Okay |
| Reverb (Simulated) | Yes! | No! |
| Reverb (Impulse) | No! | Yes! |
| Guitar effects Chorus/flanger/distortion/etc | Yes! | No! |
| SR Conversion | Yes! | No! |
| Compression | Yes! | No! |
| Panning, Mixing, etc | Yes! | No! |
Wow, so impulse reverb is really the only thing on that list you need an FFT for? Actually even that can be done in the time domain, it's just much more efficient in the frequency domain (so much so that it might be considered impossible in the time domain).
You might wonder how to adjust the frequency balance of a signal, which is what an EQ does, in the time domain rather than the frequency domain. Well, you can do it in the frequency domain, but you are asking for trouble. I'll talk about this in my next post.
Subscribe to:
Posts (Atom)
