bjorg

Solving acoustics problems

2013-11-20T13:41:00.002-08:00

A "waterfall plot" like this one is one of many tools used by
acousticians to determine the problems with a room.
Photo from realtraps which provides high quality bass traps,
an important type of acoustic treatment.

I recently received the following letter (edited):

Greetings,

The echo in my local church is really bad. I am lucky if I can understand 10% of what’s being said. I have checked with other members of the congregation and without exception they all have the same problem.

The church is medium size with high vaulted ceiling, very large windows with pillars spaced throughout. The floor is mostly wood. The speakers are flat against the side walls, spaced approx 15 metres apart and approx 10 feet above the floor.

The speakers are apparently ‘top of the range’… I just wonder if a graphic equalizer was used between the microphone and speaker, would this ‘clean up’ the sound a little?

I know that lining the walls with acoustic tiles and carpeting the floor would lessen the echo, but, we don’t want to do that if we can avoid it.

With regard to putting carpet on the floor, my thoughts are that instead of sound being absorbed by the carpet, the congregation present would absorb just as much as the carpet?. One other theory I have is regarding the speakers.

If the speakers were moved…

Michael

Hey Michael,

I sympathize with you. Going to service every week and not being able to understand what is being said must be very frustrating. While this is not the kind of thing I do every day, I do have some training in this area and will do my best to give you something helpful.

Most churches are built with little attention to acoustics and old churches were built before there was any understanding of what acoustics is. With all those reflective surfaces and no care taken to prevent the acoustic problems that they create, problems are inevitable, and sometimes, such as in your church, they are simply out of hand. In a situation like that, even a great sound-system won't be able to solve the problem.

I recommend you hire a professional in your area to come look at the space and be able to give some more specific feedback. To have them improve the situation may cost anywhere from hundreds to tens of thousands of dollars (or even more) depending on the cause of problem. However, it's helpful to have some idea of what some of the solutions are so that when you hire that professional you are prepared for what's to come. You might be able to do some more research and take a stab at solving these issues yourself.

For example, it might be useful to listen to room and conjecture, even without measurements, if the problem is bound to specific frequencies or if it's just a problem of too many echos. If you are a trained listener you might be able to stand in the room in various places, clap loudly and listen to get a sense of this. Although even a trained listener would never substitute such methods for actual measurements, I often find this method useful for developing a hypothesis (eg. I might listen and say "I believe there is a problem in the low frequencies" before measuring. Then use measurements to confirm or reject this hypothesis). Also, look at the room, are there lots of parallel walls? If so, you are likely suffering from problems at specific frequencies and it's possible that a targeted, and probably less expensive, approach will help.

Another thing you can do is find someone with some stage acting experience and have them speak loud and clear at the pulpit. Have them do this both with and without the sound system and listen to the results. If they sound much clearer without the sound-system than with the sound-system, then that suggests that your sound-system may be causing at least some of the problems.

If you can't afford an acoustician, but you are willing to experiment a bit, this kind of testing might lead you to something. For example, maybe you notice some large open parallel walls and you agree that covering one or both of them with some heavy draperies is either acceptable or would look nice. You could try it and see if it helps. It's no guarantee, but it might make a difference. Draperies are, of course, unlikely to make that much difference by themselves, so you might consider putting acoustic absorbing material behind them.

Be warned, however, that acoustic treatments done by amateurs without measurements are often beset with problems. For example, you may reduce the overall reverberation time, but leave lots of long echos at certain frequencies. This can be yield results that are no better than where you started -- possibly even worse (although in your case I think that's unlikely).

Here are the types of things a professional is likely to recommend. You've already alluded to all of them, but I'll repeat them with some more detail. I put them roughly in order of how likely they are to help, but it does depend on your specific situation:

Acoustic treatments. Churches like the one you describe are notorious for highly reflective surfaces like stone and glass, and as you surmised, adding absorptive materials to the walls, floors and ceiling will reduce the echo significantly. Also as you surmised, floor covering may be of limited effectiveness since people do also absorb and diffuse sound, but, of course, it depends on how much of the floor they cover and where. I understand your hesitation to go this route since it may impact the aesthetics of the church, and it may be expensive, but, as I mentioned above, depending on the specific situation, you may be able to achieve a dramatic result in acoustics with relatively little visual impact, and depending on the treatment needed you may be able to keep your costs controlled. You should also be able to collaborate with someone who can create acoustic treatments that are either not noticeable or enhance the esthetics of your space. (Of course, you'll also need someone familiar with things like local fire codes!)
Adjusting the speakers. It's certainly possible that putting the speakers in another location would help. If they were hung by a contractor or someone who did not take acoustics into account, they are likely to be placed poorly. Location matters more than the quality of the speakers themselves. Also, if the speakers are not in one cluster at the front, adding the appropriate delay to each set of speaker may help to ensure that sound arrives "coherently" from all speakers, which can improve intelligibility significantly. Devices to provide this kind of delay, and lots of other features, are sold under various names such as "speaker processors," and "speaker array controllers," etc.
Electronic tools. Although this is likely to be least effective, you can usually achieve some improvement with EQ, as you suggested. For permanent installations, I prefer parametric EQs, but a high quality graphic will also work. An ad-hoc technique for setting the EQ is to increase the gain until you hear feedback, and then notch out the EQ frequency that causes the feedback. Continue increasing the gain until you are happy with the results. You must be very careful to protect your speakers and your hearing when using this technique, both of which can be easily damaged if you don't know what you are doing. Most speaker processors have built-in parametric EQs and some even come with a calibrated mike that you can use with the device to adjust the settings for you automatically. I've done this, and it works great, especially with a little manual tweaking, but you do have to know what you are doing. But, of course, you can't work miracles in a bad room.

Mapping Parameters

2013-09-21T08:53:00.000-07:00

Visualizing a Linear Mapping

Very often we need to "map" one set of values to another. For example, if we have a slider that ranges from 0 to 1, and we want to use it to control the value of a frequency setting. Or perhaps we have the output of a sine wave (which ranges from -1 to 1) and we want to use that to control the intensity of a EQ. In these cases and many more, we can use a linear mapping to get from one range of values to another.

A linear mapping is simply a linear equation, such as y = mx + b, that takes an input, your slider value for example, and gives you back an output. The input is x, and the output is y. The trick is to find the values of m and b.

Let's take a concrete example. Let's say you have the output of a sine wave (say from an LFO) that oscillates between -1 and 1. Now we want to use those values to control a frequency setting from 200 to 2000. In this case, x from the equation above represents the oscillator, and y represents the frequency setting.

We know two things: we want x=-1 to map to y=200, and x=1 to map to y=2000. Since our original equation, y = mx + b, had two unknowns (m and b), we can solve it:

Original equation with both unknowns:
y = mx + b

Substituting our known values for x and y:
200 = (-1)m + b
2000 = (1)m + b

Solving for b:
2200 = 2b
1100 = b

Solving for m:
2000 = m + 1100
900 = m

Final equation:
y = 900x + 1100

You can check the final equation by substituting -1 and 1 for x and making sure you get 200 and 2000 respectively for y.

So in our LFO/frequency example, we would take our LFO value, say .75, and use that as x. Then plug that value into the formula (y=900(.75) + 1100=1775) and get our final value for our frequency setting.

Peak Meters, dBFS and Headroom

2013-07-21T07:43:00.001-07:00

The level meter from audiofile engineering's
spectre program accurately shows peak values
in dBFS

Level meters are one of the most basic features of digital audio software. In software, they are very often implemented as peak meters, which are designed to track the maximum amplitude of the signal. Other kinds of meters, such as VU meters, are often simulations of analog meters. Loudness meters, which attempt to estimate our perception of volume rather than volume itself, are also becoming increasingly common. You may also come across RMS and average meters. In this post, I'm only going to talk about peak meters.

Peak Meters

Peak meters are useful in digital audio because they show the user information that is closely associated with the limits of the medium and because they are efficient and easy to implement. Under normal circumstances, we can expect peak meters to correspond pretty well with our perception of volume, but not perfectly. The general expectation users have when looking at peak meters is that if a signal goes above a certain level at some point, that level should be indicated on the meters. In other words, if the signal goes as high as, say -2 dBFS, over some time period, then someone watching the peak meter during that time will see the meter hit the -2 dBFS mark (see below for more on dBFS). Many peak meters have features such as "peak hold" specifically designed so that the user does not need to stare at the meter.

Beyond that, there are rarely any specifics. Some peak meters show their output linearly, some show their output in dB. Some use virtual LEDs, some a bar graph. In general, if there is a numeric readout or units associated with the meter, the unit should be dBFS.

Now that we know the basics of peak meters, let's figure out how to implement them.

Update Time

Peak meters should feel fast and responsive. However, they don't update instantly. In software, it is not uncommon to have audio samples run at 44100 samples per second while the display refreshes at only 75 times per second, so there is absolutely no point in showing the value of each sample (not to mention the fact that our eyes couldn't keep up). Clearly we need to figure out how to represent a large number of samples with only one value. For peak meters, we do this as follows:

Figure out how often we want to update. For example, every 100 ms (.1s) is a good starting point, and will work well most of the time.
Figure out how many samples we need to aggregate for each update. If we are sampling at 44100 Hz, a common rate, and want to update every .1s, we need N = 44100 * .1 = 4410 samples per update.
Loop on blocks of size N. Find the peak in each block and display that peak. If the graphics system does not allow us to display a given peak, the next iteration should display the max of any undisplayed peaks.

Finding the Peak

Sound is created by air pressure swing both above
below the mean pressure.

Finding the peak of each block of N samples is the core of peak metering. To do so, we can't simply find the maximum value of all samples because sound waves contain not just peaks, but also troughs. If those troughs go further from the mean than the peaks, we will underestimate the peak.

The solution to this problem is simply to take the absolute value of each sample, and then find the max of those absolute values. In code, it would look something like this:

float max = 0;

for( int i=0; i<buf.size(); ++i ) {

const float v = abs( buf[i] )

if( v > max )

max = v;
}

At the end of this loop, max is your peak value for that block, and you can display it on the meter, or, optionally, calculate its value in dBFS first.

Calculating dBFS or Headroom

(For a more complete and less "arm wavy" intro to decibels, try here or here.) The standard unit for measuring audio levels is the decibel or dB. But the dB by itself is something of an incomplete unit, because, loosely speaking, instead of telling you the amplitude of something, dB tells you the amplitude of something relative to something else. Therefore, to say something has an amplitude of 3dB is meaningless. Even saying it has an amplitude of 0dB is meaningless. You always need some point of reference. In digital audio, the standard point of reference is "Full Scale", ie, the maximum value that digital audio can take on without clipping. If you are representing your audio as a float, 0 dB is nominally calibrated to +/- 1.0. We call this scale dBFS. To convert the above max value (which is always positive because it comes from an absolute value) to dBFS use this formula:

dBFS = 20 * log10(max);

You may find it odd that the loudest a signal can normally be is 0 dBFS, but this is how it is. You may find it useful to think of dBFS as "headroom", ie, answering the question "how many dB can I add to the signal before it reaches the maximum?" (Headroom is actually equal to -dBFS, but I've often seen headroom labeled as dBFS when the context makes it clear.)

The ABCs of PCM (Uncompressed) digital audio

2013-05-30T18:16:00.000-07:00

Digital audio can be stored in a wide range of formats. If you are a developer interested in doing anything with audio, whether it's changing the volume, editing chunks out, looping, mixing, or adding reverb, you absolutely must understand the format you are working with. That doesn't mean you need to understand all the details of the file format, which is just a container for the audio which can be read by a library. It does mean you need to understand the data format you are working with. This blog post is designed to give you an introduction to working with audio data formats.

Compressed and Uncompressed Audio

Generally speaking, audio comes in two flavors: compressed and uncompressed. Compressed audio can further be subdivided into different kinds of compression: lossless, which preserves the original content exactly, and lossy which achieves more compression at the expense of degrading the audio. Of these, lossy is by far the most well known and includes MP3, AAC (used in iTunes), and Ogg Vorbis. Much information can be found online about the various kinds of lossy and lossless formats, so I won't go into more detail about compressed audio here, except to say that there are many kinds of compressed audio, each with many parameters.

Uncompressed PCM audio, on the other hand, is defined by two parameters: the sample rate and the bit-depth. Loosely speaking, the sample rate limits the maximum frequency that can be represented by the format, and the bit-depth determines the maximum dynamic range that can be represented by the format. You can think of bit-depth as determining how much noise there is compared to signal.

CD audio is uncompressed and uses a 44,100 Hz sample rate and 16 bit samples. What this means is that audio on a CD is represented by 44,100 separate measurements, or samples, taken per second. Each sample is stored as a 16-bit number. Audio recorded in studios often use a bit depth of 24 bits and sometimes a higher sample rate.

WAV and AIFF files support both compressed and uncompressed formats, but are so rarely used with compressed audio that these formats have become synonymous with uncompressed audio. The most common WAV files use the same parameters as CD audio: 44,100 Hz and bit depth of 16-bits, but other sample rates and bit depths are supported.

Converting From Compressed to Uncompressed Formats

As you probably already know, lots of audio in the world is stored in compressed formats like MP3. However, it's difficult to do any kind of meaningful processing on compressed audio. So, in order to change a compressed file, you must uncompress, process, and re-compress it. Every compression step results in degradation, so compressing it twice results in extra degradation. You can use lossless compression to avoid this, but the extra compression and decompression steps are likely to require a lot of CPU time, and the gains from compression will be relatively minor. For this reason, compressed audio is usually used for delivery and uncompressed audio is usually used in intermediate steps.

However, the reality is that sometimes we process compressed audio. Audiofiles and music producers may scoff, but sometimes that's life. For example, it you are working on mobile applications with limited storage space, telephony and VOIP applications with limited bandwidth, and web applications with many free users, you might find yourself need to store intermediate files in a compressed format. Usually the first step in processing compressed audio, like MP3, is to decompress it. This means converting the compressed format to PCM. Doing this involves a detailed understanding of the specific format. I recommend using a library such as libsoundfile, ffmpeg or lame for this step.

Uncompressed Audio

Most stored, uncompressed audio is 16-bit. Other bit depths, like 8 and 24 are also common and many other bit-depths exist. Ideally, intermediate audio would be stored in floating point format, as is supported by both WAV and AIFF formats, but the reality is that almost no one does this.

Because 16-bit is so common, let's use that as an example to understand how the data is formatted. 16-bit audio is usually stored as packed 16-bit signed integers. The integers may be big-endian (most common for AIFF) or little-endian (most common for WAV). If there are multiple channels, the channels are usually interleaved. For example, in stereo audio (which has two channels, left and right), you would have one 16-bit integer representing the left channel, followed by one 16-bit integer representing the right channel. These two samples represent the same time and the two together are sometimes called a sample frame or simply a frame.

Sample Frame 1:

Left MSB

Left LSB

Right MSB

Right LSB

Sample Frame 2:

Left MSB

Left LSB

Right MSB

Right LSB

2 sample frames of big-endian, 16-bit interleaved audio. Each box represents one 8-bit byte.

The above example shows 2 sample frames of big-endian, 16-bit interleaved audio. You can tell it's big-endian because the most significant byte (MSB) comes first. It's 16-bit because 2 8-bit bytes make up a single sample. It's interleaved because each left sample is followed by a corresponding right sample in the same frame.

In Java, and most C environments, a 16 bit signed integer is represented with the short datatype. Therefore, to read raw 16 bit data, you will usually want to get the data into an array of shorts. If you are only dealing with C, you can do your IO directly with short arrays, or simply use casting or type punning from a raw char array. In Java, you can use readShort() from DataInputStream.

To store 16-bit stereo interleaved audio in C, you might use a structure like this:

struct {
short l;
short r;
} stereo_sample_frame_t ;

or you might simply have an array of shorts:

short samples[];

In the latter case, you would just need to be aware that when you index an even number it's the left channel, and when you index an odd number it's the right channel. Iterating through all your data and finding the max on each channel would look something like this:

int sampleCount = ...//total number of samples = sample frames * channels
int frames = sampleCount / 2 ;
short samples[]; //filled in elsewhere

short maxl = 0;
short maxr = 0;
for( int i=0; i<SIZE; ++i )
maxl = (short) MAX( maxl, abs( samples[2*i] ) );
maxr = (short) MAX( maxr, abs( samples[r*i+1] ) );
}
printf( "Max left %d, Max right %d.", maxl, maxr );

Note how we find the absolute value of each sample. Usually when we are interested in the maximum, we are looking for the maximum deviation from zero, and we don't really care if it's positive or negative -- either way is going to sound equally loud.

Processing Raw Data

You may be able to do all the processing you need to do in the native format of the file. For example, once you have an array of shorts representing the data, you could divide each short by two to cut the volume in half:

int sampleCount; //total number of samples = sample frames * channels
short samples[]; //filled in elsewhere

for( int i=0; i

samples[i] /= 2 ;
}

A few things to watch out for:

You must actually use the native format of the file or the proper conversion. You can't simply deal with the data as a stream of bytes. I've seen many questions on stack overflow where people make the mistake of dealing with 16-bit audio data byte-by-byte, even though each sample of 16-bit audio is composed of 2 bytes. This is like adding a multidigit number without the carry.
You must watch out for overflow. For example, when increasing the volume, be aware that some samples my end up out of range. You must ensure that all samples remain in the correct range for their datatype. The simplest way to handle this is with clipping (discussed below), which will result in some distortion, but is better than "wrap-around" that will happen otherwise. (the example above does not have to watch out for overflow because we are dividing not multiplying.)
Round-off error is virtually inevitable. If you are working in an integer format, eg 16-bit, it is almost impossible to deal with roundoff error. The effects of round-off will be minor but ugly. Eventually these errors will accumulate and be noticeable The example above will definitely have problems with roundoff error.

As long as studio quality isn't your goal, however, you can mix, adjust volume and do a variety of other basic operations without needing to worry too much.

Converting and Using Floating Point Samples

If you need more powerful or flexible processing, you are probably going to want to convert your samples to floating point. Generally speaking, the nominal range used for audio when audio is represented as floating point numbers is [-1,1].

You don't have to abide by this convention. If you like, you can simply convert your raw data to float by casting:

short s = ... // raw data

float f = (float) s;

But if you have some files that are 16-bit and some that are 24-bit or 8-bit, you will end up with unexpected results:

char d1 = ... //data from 8-bit file

float f1 = (float) d1; // now in range [ -128, 127 ]

short d2 = ... //data from 16-bit file

float f2 = (float) d2; // now in range [ -32,768, 32,767 ]

It's hard to know how to use f1 and f2 together since their ranges are so different. For example, if you want to mix the two, you most likely won't be able to hear the 8-bit file. This is why we usually scale audio into the [-1,1] range.

There is much debate about the right constants to use when scaling your integers, but it's hard to go wrong with this:

int i = //data from n-bit file

float f = (float) i ;

f /= M;

where M is 2^(n-1). Now, f is guaranteed to be in the range [-1,1]. After you've done your processing, you'll usually want to convert back. To do so, use the same constant and check for out of range values:

float f = // processed data

f *= M;

if( f < - M ) f = -2^(n-1);

if( f > M-1) f = M-1;

i = (int) f;

Distortion and Noise

It's hard to avoid distortion and noise when processing audio. In fact, unless what you are doing is trivial or represents a special case, noise and/or distortion are inevitable. The key is to minimize it, but doing so is not easy. Broadly speaking, noise happens every time you are forced to round and distortion happens when you change values nonlinearly. We potentially created distortion in the code where we converted from a float to an integer with a range check, because any values outside the range boundary would have been treated differently than values inside the range boundary. The more of the signal is out of range the more distortion this will introduce. We created noise in the code where we lowered the volume because we introduced round-off error when we divided by two. We also introduce noise when we convert from floating point to integer. In fact, many mathematical operations will introduce noise.

Any time you are working with integers, you need to watch out for overflows. For example, the following code will mix two input signals represented as an array of shorts. We handle overflows in the same way we did above, by clipping:

short input1[] = ...//filled in elsewhere

short input2[] = ...//filled in elsewhere

// we are assuming input1 and input2 have size SIZE or greater

short output[ SIZE ];

for( int i=0; i<SIZE; ++i )

int tmp = (int)input1[i] + (int)input2[i];

if( tmp > SHRT_MAX ) tmp = SHRT_MAX;

if( tmp < SHRT_MIN ) tmp = SHRT_MIN;

output[i] = tmp ;

}

If it so happens that the signal frequently "clips", then we will hear a lot of distortion. If we want to get rid of distortion altogether, we can eliminate it by dividing by 2. This will reduce the output volume and introduce some round-off noise, but will solve the distortion problem:

for( int i=0; i<SIZE; ++i )

int tmp = (int)input1[i] + (int)input2[i];

tmp /= 2;

output[i] = tmp ;

}

Notes:

A few final notes:

For some reason, WAV files don't support signed 8-bit format, so when reading and writing WAV files, be aware that 8-bits means unsigned, but in virtually all other cases it's safe to assume integers are signed.
Always remember to swap the bytes if the native endian-ness doesn't match the file endian-ness. You'll have to do this again before writing.
When reducing the resolution of data (eg, casting from float to int; multiplying an integer by a non-integer, etc), you are introducing noise because you are throwing out data. It might seem as though this will not make much difference, but it turns out that for sampled data in a time-series (like audio) it has a surprising impact. This impact is small enough that for simple audio applications you probably don't need to worry, but for anything studio-quality you will want to understand something called dither, which is the only correct way to solve the problem.
You may have come across one of these unfortunate posts, which claims to have found a better way to mix two audio signals. Here's the thing: there is no secret, magical formula that allows you to mix two audio signals and keep them both at the same original volume, but have the mix still be within the same bounds. The correct formula for mixing two signals is the one I described. If volume is a problem, you can either turn up the master volume control on your computer/phone/amplifier/whatever or use some kind of processing like a limiter, which will also degrade your signal, but not as badly as the formula in that post, which produces a terrible kind of distortion (ring modulation).

Audio IIR v FIR EQs

2012-11-27T08:51:00.002-08:00

Digital filters come in two flavors: IIR (or "Infinite Impulse Response") and FIR (or "Finite Impulse Response"). Those complex acronyms may confuse you, so let's shed a little light on the situation by defining both and explaining the differences.

Some people are interested in which is better. Unfortunately, as with many things, there is no easy answer to that question, other than "it depends", and sometimes what it depends on is your ears. I won't stray too deep into field of opinions, but I will try to mention why some people claim one is better than the other and what some of the advantages and disadvantages are in different situations.

How Filters Work

When you design a filter, you start with a set of specifications. To audio engineers, this might be a bit vague, like "boost 1 kHz by 3 dB", but electrical engineers are usually trained to design filters with very specific constraints. However you start, there's usually some long set of equations, and rules used to "design" the filter, depending on what type of filter you are designing and what the specific constraints are (to see one way you might design a filter, see this post on audio eq design). Once the filter is "designed" you can actually process audio samples.

IIR Filters

Once the filter is designed, the filter itself is implemented as difference equations, like this:

y[i] = a0 * x[i] + a1 * x[i-1] ... + an * x[i-n] - b1 * y[i-1] ... - bm * y[i-m].

In this case, y is an array storing the output, and x is an array storing the input. Note that each output is a linear function of previous inputs and outputs, as well as the current input.

In order to know the current value of y, we need to know the last value of y, and to know that, you must know the value of still earlier values of y, and so on, all the way back until we reach our initial conditions. For this reason, this kind of filter is sometimes called a "recursive" filter. In principle, this filter can be given a finite input, and it will produce output forever. Because its response is infinite, we call this filter an IIR, or "Infinite Impulse Response" filter.

(To further confuse the terminology, IIR filters are often designed with certain constraints that make them "minimum phase." While IIR filters are not all minimum phase, many people use the terms "recursive", "IIR" and "minimum phase" interchangeably.)

Digital IIR filters are often modeled after analog filters. In many ways, analog-modled IIR filters sound like analog filters. They are very efficient, too: for audio purposes, they usually only require a few multiplies.

FIR Filters

FIR filters, on the other hand, are usually implemented with a difference equation that looks like this:

y[i] = a0 * x[i] + a1 * x[i-1] a2 * x[i-2] + ... an * x[i-n] + an * x[i-n-1] + ... + a1 * x[2i+1] + a0 * x[2i]

In this case, we don't use previous outputs: in order to calculate the current output, we only need to know the previous n inputs. This may improve the numerical stability of the filter because roundoff errors are not accumulated inside the filter. However, generally speaking, FIR filters are much more CPU intensive for a comparable response, and have some other problems, such as high latency, and both pass-band and stop-band ripple.

If an FIR filter can be implemented using a difference equation that is symmetrical, like the one above, it has a special property called "linear phase." Linear phase filters delay all frequencies in the signal by the same amount, which is not possible with IIR filters.

Which Filter?

When deciding which filter to use, there are many things to take into account. Here are some of those things:

Some people feel that linear phase FIR filters sound more natural and have fewer "artifacts".
FIR filters are usually much more processor intensive for the same response.
FIR filters have "ripple" in both the passband and stopband, meaning the response is "jumpy". IIR filters can be designed without any ripple.
IIR filters can be easily designed to sound like analog filters.
IIR filters require careful design to ensure stability and good numerical error properties, however, that art is fairly advanced.
FIR filters generally have a higher latency.

Basic Audio EQs

2012-08-23T21:12:00.002-07:00

In my last post, I looked at why it's usually better to do EQ (or filtering) in the time domain than the frequency domain as far as audio is concerned, but I didn't spend much time explaining how you might implement a time-domain EQ. That's what I'm going to do now.

The theory behind time-domain filters could fill a book. Instead of trying to cram you full of theory we'll just skip ahead to what you need to know to do it. I'll assume you already have some idea of what a filter is.

Audio EQ Cookbook

The Audio EQ Cookbook by Robert Bristow-Johnson is a great, albeit very terse, description of how to build basic audio EQs. These EQs can be described as second order digital filters, sometimes called "biquads"because the equation that describes them contains two quadratics. In audio, we sometimes use other kinds of filters, but second order filters are a real workhorse. First order filters don't do much: they generally just allow us to adjust the overall balance of high and low frequencies. This can be useful in "tone control" circuits, like you might find on some stereos and guitars, but not much else. Second order filters give us more control -- we can "dial in" a specific frequency, or increase or decrease frequencies above and below a certain threshold, with a fair degree of accuracy, for example. If we need even more control than a second order filter offers, we can often simply take several second order filters and place them in series to simulate the effect of a single higher order filter.

Notice I said series, though. Don't try putting these filters in parallel, because they not only alter the frequency response, but also the phase response, so when you put them in parallel you might get unexpected results. For example, if you take a so-called all-pass filter and put it in parallel with no filter, the result will not be a flat frequency response, even though you've combined the output of two signals that have the same frequency response as the original signal.

Using the Audio EQ Cookbook, we can design a peaking, high-pass, low-pass, band-pass, notch (or band-stop), or shelving filter. These are the basic filters used in audio. We can even design that crazy all-pass filter I mentioned which actually does come in handy if you are building a phaser. (It has other uses, too, but that's for another post.)

Bell Filter

Let's design a "bell", or "peaking" filter using RBJ's cookbook. Most other filters in the cookbook are either similar to the bell or simpler, so once you understand the bell, you're golden. To start with, you will need to know the sample rate of the audio going into and coming out of your filter, and the center frequency of your filter. The center frequency, in the case of the bell filter, is the frequency that "most affected" by your filter. You will also want to define the width of the filter, which can be done in a number of ways usually with some variation on "Q" or "quality factor" and "bandwidth". RBJ's filters define bandwidth in octaves, and you want to be careful that you don't extend the top of the bandwidth above the Niquist frequency (or 1/2 the sample rate), or your filter won't work. We also need to know how much of our center frequency to add in dB (if we want to remove, we just use a negative value, and for no change, we set that to 0).

Fs = Sample Rate
F0 = Center Frequency (always less than Fs/2)
BW = Bandwidth in octaves
g = gain in dB

Great! Now we are ready to begin our calculations. First, RJB suggests calculating some intermediate values:

A = 10^(g/40)
w0 = 2*pi*f0/Fs c = cos(w0) s = sin(w0) alpha = s*sinh( ln(2)/2 * BW * w0/s )

This is a great chance to use that hyperbolic sin button on your scientific calculator that, until now, has only been collecting dust. Now that we've done that, we can finally calculate the filter coefficients, which we use when actually processing data:

b0 = 1 + alpha*A b1 = -2*c b2 = 1 - alpha*A a0 = 1 + alpha/A a1 = -2*c a2 = 1 - alpha/A

Generally speaking, we want to "normalize" these coefficients, so that a0 = 1. We can do this by dividing each coefficient by a0. Do this in advance or the electrical engineers will laugh at you:

b0 /= a0 b1 /= a0 b2 /= a0 a1 /= a0 a2 /= a0

Now, in pseudocode, here's how we process our data, one sample at a time using a "process" function that looks something like this:

number xmem1, xmem2, ymem1, ymem2;

void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
number process( number x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;

xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;

return y;
}

You'll probably have some kind of loop that your process function goes in, since it will get called once for each audio sample.

There's actually more than one way to implement the process function given that particular set of coefficients. This implementation is called "Direct Form I" and happens to work pretty darn well most of the time. "Direct form II" has some admirers, but those people are either suffering from graduate-school-induced trauma or actually have some very good reason for doing what they are doing that in all likelihood does not apply to you. There are of course other implementations, but DFI is a good place to start.

You may have noticed that the output of the filter, y, is stored and used as an input to future iterations. The filter is therefore "recursive". This has several implications:

The filter is fairly sensitive to errors in the recursive values and coefficients. Because of this, we need to take care of what happens with the error in our y values. In practice, on computers, we usually just need to use a high resolution floating point value (ie double precision) to store these (on fixed point hardware, it is often another matter).
Another issue is that you can't just blindly set the values of your coefficients, or your filter may become unstable. Fortunately, the coefficients that come out of RJB's equations always result in stable filters, but don't go messing around. For example, you might be tempted to interpolate coefficients from one set of values to another to simulate a filter sweep. Resist this temptation or you will unleash the numerical fury of hell! The values in between will be "unstable" meaning that your output will run off to infinity. Madness, delirium, vomiting and broken speakers are often the unfortunate casualties.
On some platforms you will have to deal with something called "denormal" numbers. This is a major pain in the ass, I'm sorry to say. Basically it means our performance will be between 10 and 100 times worse than it should be because the CPU is busy calculating tiny numbers you don't care about. This is one of the rare cases where I would advocate optimizing before you measure a problem because sometimes your code moves around and it comes up and it's very hard to trace this issue. In this case, the easiest solution is probably to do something like this (imagine we are in C for a moment):

#DEFINE IS_DENORMAL(f) (((*(unsigned int *)&(f))&0x7f800000) == 0)
float xmem1, xmem2, ymem1, ymem2;

void reset() {
xmem1 = xmem2 = ymem1 = ymem2 = 0;
}
float process( float x ) {
number y = b0*x + b1*xmem1 + b2*xmem2 - a1*ymem1 - a2*ymem2;

if( IS_DENORMAL( y ) )
y = 0;

xmem2 = xmem1;
xmem1 = x;
ymem2 = ymem1;
ymem1 = y;

return y;
}

Okay, happy filtering!

Why EQ Is Done In the Time Domain

2012-08-08T14:58:00.000-07:00

In my last post, I discussed how various audio processing may be best done in the frequency or time domain. Specifically, I suggested that EQ, which is a filter that alters the frequency balance of a signal, is best done in the time domain, not the frequency domain. (See my next post if you want to learn how to implement a time-domain filter.)

If this seems counter intuitive to you, rest assured you are not alone. I've been following the "audio" and "FFT" tags (among others) on stack overflow and it's clear that many people attempt to implement EQs in the frequency domain, only to find that they run into a variety of problems.

Frequency Domain Filters

Let's say you want to eliminate or reduce high frequencies from your signal. This is called a "low-pass" filter, or, less commonly, a "high-cut" filter. In the frequency domain, high frequencies get "sorted" into designated "bins", where you can manipulate them or even set them to zero. This seems like an ideal way to do low-pass filtering, but lets explore the process to see why it might not work out so well.

Our first attempt at a low-pass filter, implemented with the FFT might look something like this:

loop on audio input
if enough audio is received, perform FFT, which gives us audio in the frequency domain

in frequency domain, perform manipulations we want. In the case of eliminating high frequencies, we set the bins representing high frequencies to 0.
perform inverse FFT, to get audio back in time domain
output that chunk of audio

But there are quite a few problems with that approach:

We must wait for a chunk of audio before we can even begin processing, which means that we will incur latency in our processing. The higher quality filter we want, the more audio we need to wait for. If the input buffer size does not match the FFT size, extra buffering needs to be done.
The FFT, though efficient compared to the DFT (which is the FFT without the "fast" part), performs worse than linear time, and we need to do both the FFT and it's inverse, which is computationally similar. EQing with the FFT is therefore generally very inefficient compared to comparable time-domain filters.
Because our output chunk has been processed in the frequency domain independent of samples in neighboring chunks, the audio in neighboring chunks may not be continuous. One solution is to process the entire file as one chunk (which only works for offline, rather than real-time processing, and is computationally expensive). The better solution is the OLA or Overlap Add method but this involves complexity that many people miss when implementing a filter this way.
Filters implemented via FFT, as well as time-domain filters implemented via IFFT, often do not perform the way people expect. For example, many people expect that if they set all values in bins above a certain frequency to 0, then all frequencies above the given frequency will be eliminated. This is not the case. Instead, frequency responses at the bin values will be 0, but the frequency response between those values is free to fluctuate -- and it does fluctuate, often greatly. This fluctuation is called "ripple." There are techniques for reducing ripple but they are complex, and they don't eliminate ripple. Note that, in general, frequencies across the entire spectrum are subject to ripple, so even just manipulating a small frequency band many create ripple across the entire frequency spectrum.
FFT filters suffer from so-called "pre-echo", where the sounds can be heard before the main sound hits. In and of itself, this isn't really a problem, but sounds are "smeared" so badly by many designs, that many in the audio world feel that these filters can effect the impact of transients and stereo imaging if not implemented and used correctly.

So it's clear that FFT filters may not be right, or if they are, they involve much more complexity than many people first realize.

As a side note, one case where it might be worth all that work is a special case of so-called FIR filters (also sometimes called "Linear phase" filters). These are used sometimes in audio production and in other cases. In audio, they are usually used only in mastering because of their high latency and computational cost, but even then, many engineers don't like them (while others swear by them). FIR filters are best implemented in the time domain, as well, until the number of "taps"in the filter becomes enormous, which it sometimes does, and it actually becomes more efficient to implement using an FFT with OLA. FIR filters suffer from many of the problems mentioned above including pre-echo, high computational cost and latency, but they do have some acoustical properties that make them desirable in some applications.

Time Domain Filters

Let's try removing high frequencies in the time domain instead. In the time domain, high frequencies are represented by the parts of the signal that change quickly, and low frequencies are represented as the parts that change slowly. One simple way to remove high frequencies, then, would be to use a moving average filter:

y(n) = { x(n) + x(n-1) + .... + x(n-M) } / (M+1)

where x(i) is your input sample at time i, and y(i) is your output sample at time i. No FFT required for that (This is not the best filter for removing high frequencies -- in fact we can do WAY better -- but it is my favorite way to illustrate the point. The moving average filter is not uncommon in economics, image processing and other fields partly for this reason.). Several advantages are immediately obvious, and some are not so obvious:

Each input sample can be processed one at a time to produce one output sample without having to chunk or wait for more audio. Therefore, there are also no continuity issues and minimal latency.
It is extremely efficient, with only a few multiplies, adds and memory stores/retrievals required per sample.
These filters can be designed to closely mimic analog filters.

A major disadvantage is that it is not immediately obvious how to design a high-quality filter in the time domain. In fact, it can take some serious math to do so. It's also worth noting that many time-domain filters, like frequency domain filters, also suffer from ripple, but for many design methods, this ripple is well defined and can be limited in various ways.

In the end, the general rule is that for a given performance, you can get much better results with the time-domain than the frequency domain.

When to (not) use the FFT

2012-08-04T09:15:00.003-07:00

In the last post I discussed one use for the FFT: pitch tracking. I also mentioned that there were better ways to do pitch tracking. Indeed, aside from improvements on that method, you could also use entirely different methods that don't rely on the FFT at all.

The FFT transforms data into the "frequency domain", or, if your data is broken down into chunks, the FFT transforms it into the "time-frequency domain," which we often still think of as the frequency domain. However, the most basic "domain" you can work in is usually the "time domain." In the time domain, audio is represented as sequence of amplitude values. You may know this as "PCM" audio. This is what's usually stored in WAVs and AIFs, and when we access audio devices like soundcards, this is the most natural way to transfer data. It turns out we can also do a whole lot of processing and analysis in the time domain as well.

Table 1: Recommendations for Audio Processing in the Time Domain vs. the Frequency Domain
Process	Time Domain	Frequency Domain
Filtering/ EQ	Yes!	No!
Pitch Shifting	Okay	Okay
Pitch Tracking	Okay	Okay
Reverb (Simulated)	Yes!	No!
Reverb (Impulse)	No!	Yes!
Guitar effects Chorus/flanger/distortion/etc	Yes!	No!
SR Conversion	Yes!	No!
Compression	Yes!	No!
Panning, Mixing, etc	Yes!	No!

Wow, so impulse reverb is really the only thing on that list you need an FFT for? Actually even that can be done in the time domain, it's just much more efficient in the frequency domain (so much so that it might be considered impossible in the time domain).

You might wonder how to adjust the frequency balance of a signal, which is what an EQ does, in the time domain rather than the frequency domain. Well, you can do it in the frequency domain, but you are asking for trouble. I'll talk about this in my next post.

Frequency detection using the FFT (aka pitch tracking) With Source Code

2012-07-22T18:45:00.001-07:00

It's not necessarily as simple a it seems to find the pitch
from an FFT. Some pre-processing is required as well
as some knowledge of how the data is organized.

How to track pitch with the FFT seems to be a very commonly asked question on stack overflow. Many people seem to think tracking pitch is as simple as putting your data into an FFT, and looking at the result. Unfortunately, this is not the case. Simply applying an FFT to your input, even if you know what size FFT to use, is not going to give you optimal results, although it might work in some cases.

At the end of the day, using the FFT is not actually the best pitch tracking method available for tracking or detecting pitch of an audio signal. While it is possible to make a good pitch tracker using the FFT, doing it right requires a tremendous amount of work. The algorithm shown here works, and works pretty well, but if you need something that converges on the correct pitch really quickly, is very accurate, or tracks multiple notes simultaneously, you need something else.

Still, you can create a decent pitch tracking algorithm that's reasonably easy to understand using the FFT. It doesn't require too much work, and I've explained it and provided code, in the form of a command-line C guitar tuner app which you can get from github. It compiles and runs on Mac OS X and you should be able to get it to run on other platforms without much trouble. If you want to port to other languages, that shouldn't be too hard either. It's worth noting that I specifically designed this app to be similar to the tuner described by Craig A. Lindley in Digital Audio with Java, so if you are looking for Java source code, you can check out his code (although there are differences between hi code and mine).

The Big Picture

To do our pitch detection, we basically loop on the following steps:

Read enough data to fill the FFT
Low-pass the data
Apply a window to the data
Transform the data using the FFT
Find the peak value in the transformed data
Compute the peak frequency from from the index of the peak value in the transformed data

This is the main processing loop for the tuner, with some stuff left out:

while( running )
{
// read some data
err = Pa_ReadStream( stream, data, FFT_SIZE );

// low-pass
for( int j=0; j<FFT_SIZE; ++j ) {
data[j] = processSecondOrderFilter( data[j], mem1, a, b );
data[j] = processSecondOrderFilter( data[j], mem2, a, b );
}
// window
applyWindow( window, data, FFT_SIZE );

// do the fft
for( int j=0; j
datai[j] = 0;
applyfft( fft, data, datai, false );

//find the peak
float maxVal = -1;
int maxIndex = -1;
for( int j=0; j<FFT_SIZE; ++j ) {
float v = data[j] * data[j] + datai[j] * datai[j] ;
if( v > maxVal ) {
maxVal = v;
maxIndex = j;
}
}
float freq = freqTable[maxIndex];
//...
}

Let's go over each of the steps and see how they work.

Audio Data

We always need to start with a sequence of numbers representing the amplitude of audio over time (sometimes called "Linear, PCM audio"). This is what we get from most uncompressed audio formats like AIFF and WAV. Its also what you get from audio APIs like ASIO, CoreAudio and ALSA. In this case, we are using PortAudio, which acts like a portable wrapper around these and other APIs. If you have a compressed format such as MP3 or OGG, you will have to convert it to uncompressed audio first.

Your data might be 16-bit integer, 8-bit integer, 32-bit floating point or any number of other formats. We'll assume you know how to get your data to floating point representation in the range from -1 to 1. PortAudio takes care of this for us when we specify these input parameters:

inputParameters.device = Pa_GetDefaultInputDevice();
inputParameters.channelCount = 1;
inputParameters.sampleFormat = paFloat32;
inputParameters.suggestedLatency = Pa_GetDeviceInfo( inputParameters.device )->defaultHighInputLatency ;
inputParameters.hostApiSpecificStreamInfo = NULL;

You'll also need to know how often your audio is sampled. For a tuner, less is more, so we'll use a sample rate of 8 kHz, which is available on most hardware. This is extremely low for most audio applications (44.1 kHz is considered standard for audio and 48 kHz is standard for video), but for a tuner, 8 kHhz is plenty.

#define SAMPLE_RATE (8000)

Low-Pass Filtering

There's no hard and fast rule about low-pass filtering (or simply "low-passing") your audio data. In fact, it's not even strictly necessary, but doing so can get rid of unwanted noise and the higher frequencies that sometimes masquerade as the fundamental frequency. This is important because some instruments have component frequencies called harmonics that are more powerful than the "fundamental" frequencies, and usually we are interested in the fundamental frequencies. Filtering, therefore, can improve the reliability of the rest of the pitch tracker significantly. Without filtering, some noise might appear to be the dominant pitch, or, more likely, the dominant pitch might appear to be a harmonic of the actual fundamental frequency.

A good choice for the filter is a low-pass filter with a center frequency around or a little above the highest pitch you expect to detect. For a guitar tuner, this might be the high E string, or about 330 Hz. So that's what we'll use -- in fact, we low-pass it twice. If you are modifying the code for another purpose, you can set the center frequency to something that makes sense for your application.

If you aren't sure or you want to go with or want something less agressive, you could try a moving average filter, which simply outputs the average of the current input and some number of previous inputs. Intuitively, we can understand that this filter reduces high frequencies because signals that change quickly get "smoothed" out.

// Process every sample of your input with this function
// (this is not used in our guitar tuner)
function float twoPointMovingAverageFilter( float input ) {
static float lastInput = 0;
float output = ( input + lastInput ) / 2 ;
lastInput = input;
return output;
}

The moving average filter won't make a huge difference, but if the low pass filter I used in my code doesn't suit you and you don't have the degree in electrical engineering required to design the right digital filter (or don't know what the right filter is), it might be better than nothing. I haven't tested the moving average filter myself.

Windowing

Generally speaking, FFTs work in chunks of data, but your input is a long or even continuous stream. To fit this round peg into this square hole, you need to break off chunks of your input, and process the chunks. However, doing so without proper treatment may prove detrimental to your results. In rough terms, the problem is that the edges get lopped off very sloppily, creating artifacts at frequencies that aren't actually present in your signal. These artifacts, called "sidelobes", cause problems for many applications. I know that some tuners are designed without special treatment, so you can skip this step, but I strongly recommend you keep reading because it's easy to deal with this problem.

To reduce the sidelobes, we premultiply each chunk of audio with another signal called a window, or window function. Two simple and popular choices for window functions are the Hamming window, and the Hann window. I put code for both in the tuner, but I used the Hann window.

void buildHanWindow( float *window, int size )
{
for( int i = 0 to size )
window[i] = .5 * ( 1 - cos( 2 * M_PI * i / (size-1.0) ) );
}
void applyWindow( float *window, float *data, int size )
{
for( int i = 0 to size )
data[i] *= window[i] ;
}

For a tuning app, the windows may overlap, or there may be gaps in between them, depending on your needs and your available processing power. For example, by overlapping and performing more FFTs, and then averaging the results, you may get more accurate results more quickly, at the cost of more CPU time. I strongly recommend doing this in real apps. I did not do this in my app to make the code easier to follow, and you'll see that the values sometimes jump around and don't respond smoothly.

FFT

The FFT, or Fast Fourier Transform, is an algorithm for quickly computing the frequencies that comprise a given signal. By quickly, we mean O( N log N ). This is way faster than the O( N²) which how long the Fourier transform took before the "fast" algorithm was worked out, but still not linear, so you are going to have to be mindful of performance when you use it. Because the FFT is now the standard way to compute the Fourier transform, many people often use the terms interchangeably, even though this is not strictly correct.

The FFT works on a chunk of samples at a time. You don't get more or less data out of a Fourier Transform than you put into it, you just get it in another form. That means that if you put ten audio samples in you get ten data-points out. The difference is that these ten data points now represent energy at different frequencies instead of energy at different times, and since our data uses real numbers, and not complex, the FFT will contain some redundancies -- specifically, only the first half of the spectrum contains relevant data. That means that for ten samples in, we really only get five relevant data-points out.

Clearly, the more frequency resolution you need, the more time data you need to give it. However, at some point you will run into the problem of not being able to return results quickly enough, either because you are waiting for more input, or because it takes too long to process. Choosing the right size FFT is critical: too big and you consume lots of CPU and delay getting a response, too small and your results lack resolution.

How do we know how big our FFT should be? You can determine the accuracy of your FFT with this simple formula:

binSize = sampleRate/N ;

For example, with a bin size of 8192 (most implementations of the FFT work best with powers of 2), and a sample rate of 44100, you can expect to get results that are accurate to within about 5.38 Hz. Not great for a tuner, but, hey, that's why we are sampling at 8000 Hz, which gives us an accuracy of better than 1 Hz. Still not perfect, for, say, a 5 string bass, but you can always use a a larger N if you need to. Keep in mind that getting enough samples to get that much accuracy takes longer than a second, so our display only updates about once a second. That's yet another reason you might want to overlap your windows.

The output of the FFT is an array of N complex numbers. It is possible to use both the real and imaginary part to get very accurate frequency information, but for now we'll settle for something simpler and much easier to understand: we simply look at the magnitude. To find the magnitude of each frequency component, we use the distance formula:

for( i in 0 to N/2 )
magnitude[i] = sqrt( real[i]*real[i] + cmpx[i]*cmpx[i] );

Now that we know the magnitude of each FFT bin, finding the frequency is simply a matter of finding the bin with the maximum magnitude. The frequency will then be the bin number times the bin size, which we computed earlier. Note that we don't actually need to compute the square root to find the maximum magnitude, so our actual code skips that step.

We do a bit more in our code, like identify the nearest semi-tone and find the difference between the that semi-tone and the identified frequency, but for stuff like that we'll leave the code to speak for itself.

FCC calls for quieter commercials, but how?

2011-12-15T07:29:00.000-08:00

In the news recently is the so-called "CALM Act" (Commercial Advertisement Loudness Mitigation Act), which will force TV and cable broadcasters (specifically, multichannel broadcasters) to make advertisements and content be the same volume.

The problem of blaring commercials, the TV equivalent of the loudness wars, have been going on for some time, but with newer technologies, including digital broadcasting, it has gotten worse. The fundamental issue is that advertisers want to be heard, so they want to be louder than their competition (the program material). However, it's not just a matter of submitting content with higher volume -- broadcasters, whether analog or digital, have limits to the maximum volume they transmit. Instead, they use recording tricks called compression and limiting to boost the average levels of their recordings while keeping the maximum just within limits. The result is a commercial that sounds louder that the program.

While digital technology has made it possible to take this loudness to an extreme, digital distribution has also provided one part of the solution: each piece of program material can be pre-marked with loudness information using a standard called A/85 rp which is used by the consumer's television to determine playback volume.

The trick is to accurately determine the loudness of the material, so that the A/85 tags can be correctly applied. As it turns out, this is no simple task. The ear is more sensitive to some frequencies than others, and you don't want to use simple averaging because then long periods of silence would allow commercials to get away with short loud segments that were disproportionately loud. To get around these issues, A/85 rp recommends the use of a well researched standard called ITU-R BS.1770 (which may be more familiar from the EBU metering and normalizing standard which uses it, EBU R 128). The ITU standard allows the measurement of loudness that very closely matches human perception of loudness, and offers recommendations for use in live, short and long form content.

Will the system be gamed? Perhaps content creators will find some way to trick ITU measurement system to make their content appear less loud than it really is, but even if they do, it seems unlikely that they will be able to game the system anywhere near as well as they currently do.

How will the FCC know if the system is working, and which broadcasters are using the system? They will rely on the public to call-in complaints. Of course, since this has, for years, been the number-one complaint they have received, I don't anticipate too much difficulty there.

Linear Interpolation for Audio in C, C++, Java, etc.

2010-10-26T14:02:00.000-07:00

Linear Interpolation in digital audio came up recently, so I'm posting it here. I hope it's useful for other folks.

Technically, Linear interpolation is the act of fitting a line through existing points and computing new data from that line. This might sound complex, but it turns out to be pretty easy, and we can do it with a few lines of code.

Visually, we can think about drawing a line between two points, and then being able to find the y value for any given x. However, I actually think it's easier to think of it non-graphically because linear interpolation is really just a kind of weighted average.

For audio, we frequently want to use linear interpolation because it's easy to implement, computationally efficient, and it's "smooth" in some sense that I won't get into, but I will say that it generally does not create clicks and pops when you don't want it to. Linear interpolation is useful for handling fader changes and button-push "de-bouncing" and so on, and it's often great for simple cross-fades and the like.

The formula for linear interpolation is derived from the formula for the line between two points. You can see wikipedia for the details. I am omitting it here and jumping straight to an example. To perform a linear interpolation of 100 samples where y[0] = 7, and y[100] = 20, our code would look something like this:

double start = 7;
double end = 20;

for( int i=0; i<100; ++i ) {
   double ratio = i/100.0;
   y[i] = start * (1-ratio) + end * i;
}

You can think of ratio as the weight given to the start variable, and 1-ratio as the weight given to the end variable. As we slide through the samples, we slowly transition from the start value to the end value.

Notice I've been very careful to make sure y[0] is actually 7, and y[99] is not quite 20, so that y[100] will smoothly transition to 20 as required. Off-by-one errors can screw this up and while you might not hear the difference, you want to get that right or you could end up with pops, weird overs, or other subtle problems.

Now you might say that the above code is not very efficient. You can improve on it somewhat using the code below, but be aware that if you interpolating over a large number of samples, especially if you are using single precision floats, you might not quite end up where you expect. The performance gain for this more complex code is likely to be minimal on modern computer hardware, but may be substantial on DSP hardware, where operations like floating point adds take much less time than floating point divides. A clever compiler could theoretically make the same object code out of these two code snippets if it can determine that precision won't be an issue.

double start = 7;
double end = 20;
int length = 100;

double interval = ( end - start ) / length;
y[0] = start;

for( int i=1; i
   y[i] = y[i-1] + interval ;

By the time we get to the end, y[length-1] should be ( end - start ) / length * length = ( end - start ) larger than the start, which is exactly right.

That's all there is to Linear Interpolation, so let's go to an audio example: Say we want to go from off, or mutted to on, or unmutted, without a click. Instead of 7 and 20, we'd use 0.0 for off, and 1.0 for on. Also, instead of setting the values in the array, we are going to be multiplying the values in the array, because that's how we do gain changes. Now, let's say we don't know what a good length of time is for unmutting, so lets just make that a variable. Below is a function that takes an array of mono samples, transitions them from off to on at a given time with the given interval. I haven't tested this exact code, but it should be good enough for illustrative purposes:

void unmute( float data[],
   int totalSamples, //how many samples in our array
   int startUnmute, //when do we start unmutting?
   int transitionLength ) //how long is our transition?
{
   //basic sanity check:
   if( startUnmute + transition > totalSamples )
   exit( -1 ); //or throw an exception if this were java
   //process the muted samples:
   for( int i=0; i
   data[i] = 0; //effectively multiplied by zero

   //process the transition samples.
   // this is where the linear interpolation
   // happens. We are interpolating between 0 and 1,
   // and multiplying the samples by that value:
   for( int i=0; i

double ratio = i/(double)transitionLength;

data[i+startUnmute] *= ratio; //multiply by the ratio, which is transitioning from 0 to 1

}

// the rest of the samples don't need to be processed:

// they are effectively multiplied by 1 already.

}

Linearity and dynamic range in Int->Float->Int

2009-12-09T10:59:00.000-08:00

Update: some comments.

In my last blog post, I discussed converting audio from integer to floating point back to integer, mostly from a programming perspective. I showed how there are a lot of ways to do the conversion. Most audio folks would say, "huh, I thought there were only two ways to convert floating point numbers to integers." And they'd be right: with and without dither. So what's all the fuss about?

Indeed, that's a good question. Most audio folks have this expectation:

When I have dither off and no effects (including volume, etc) I expect to be able to get out exactly what I put in.
When I have dither on, I expect it to sound good.

Point 1 is what we referred to as bit transparency in the previous post, and we found lots of ways to do that. Point 2 is a bit more subtle. How do you make something sound good? In this case, we mean transparent, and what's especially critical is that we eliminate truncation and IM distortion which are the hallmarks of cold, harsh digital audio.

Figure 1. Comparison of 16-bit conversion using the same scaling factor (matched) vs. different scaling factors (mismatched). Mismatched scaling factors come from Method 3 from previous post and matched are Method 2.

What we need when it comes to transparency and avoiding that cold harsh sound is linearity. In this regard, the methods discussed in my last post, transparent or not, don't stack up equally. You might think you could judge them by inspection, but the mathematics are a bit more complex. Let's be clear about what we need to test: what we don't care about is how accurately a given conversion method responds to a DC signal: we aren't measuring the temperature or the amount of fuel in a tank. Rather, when we talk about linearity in audio we are referring to the ability to accurately translate dynamic information. Think about it: when you buy an analog-to-digital converter, you aren't concerned about its ability to accurately measure a certain input voltage, are you? No, you care about it's frequency response and dynamic range. In the same way, we must ensure maximum signal-to-noise ratio and dynamic range in our conversions. It turns out not all the conversions from my last post have good dynamic performance.

Tests

It is sometimes claimed that the percent error introduced by "mismatched" conversion (ie Method 3 from the previous post) is small, and therefore of little concern, but percent error is not what matters in a dynamic system such as audio, so we will not concern ourselves with that and investigate the dynamic performance instead. In Figure 1 we show the results of "mismatched" conversion. In this case we are converting from a source signal of 2 sine waves in double precision to 16-bit integer (to simulate A/D conversion), then to single-precision floating point and back to 16-integer (to simulate a standard editing workflow), and finally back to double precision (to simulate D/A conversion). This is more or less the minimum error we can expect with the mismatched method if we use audio editing software but do not use DSP, and therefore represents a best-case scenario. In the dynamic analysis, it becomes clear that using different scaling factors produces more noise whether dither is used or not. In fact, the difference made by dither is dwarfed by the difference in techniques. Just as importantly, the quality of the noise is bad: rather than shifting the noise floor up, we see spikes indicating that the noise is likely to be audible even at low levels. These results also suggests that it is important to use the same scaling factors throughout the processing chain.

Figure 2. Quantization and dithering from float to int and back to float is tested at 16 bits (a,b) and 24 bits (c,d) using a full-scale sine (a,c) and the sum of two sine (b,c). Notes: the sum of two sines does not clip; clipped signal and raw quantized signal are not shown in a.

Figure 2. shows the dynamic performance of conversion using 2^n, (2^n)-1 and "asymmetrical" conversion (ie Method 4 from my previous post). We will discuss below that "asymmetrical" is a misnomer. We also looked at dithered and non-dithered versions.

Two types of tests were run: first, a full-scale sine wave was generated, converted to int, and back to float for FFT analysis. The second test was the same except that two sines, each at 1/2 full scale were summed together. Each test was run at 16 and 24 bits. Note that the full-scale sine wave cannot be accurately represented in some of these conversion methods, resulting in some clipping.

As you can see, all dithered converters performed fine at 16-bit as long as nothing was out of scale. At 24-bit, the weakness of the (2^n)-1 converter becomes clear: it actually performs worse than rounding (ie. no dithering). Clearly (2^n)-1 is not an acceptable transformation for 24-bit integers and single precision floating point numbers. The 2^n converter performed admirably on all tests except the 16-bit full-scale test (1a). Those small spikes line up perfectly with the spikes caused by clipping as expected (results not shown) meaning that it is harmonic distortion -- not the worst thing that could happen, but, still, the asymmetric converter does outperform it in this regard.

As mentioned, I'm calling Method 4 from my previous post the "asymmetric" method, but it is only asymmetric in the sense that you apply different math to positive and negative numbers. As these results show, it is linear. Moreover, it is symmetric with respect to dither amplitude, which is what ensures its linear behavior.

Conclusions

Clearly the two winners here are the so-called asymmetric method and the (2^n) method. Both methods excel in the critical areas of bit transparency and linearity. Even their un-dithered performance is quite good, and they are obviously superior to other methods.

The one area in which the asymmetric model outperforms the (2^n) model is in terms of clipping signals that originated from higher resolution. Even with dither, we still see incorrect behavior with the the (2^n) model because dither only finds its way to 1/2 LSB, whereas +1 clips by going 1 LSB over. The question is whether or not this matters. Indeed there is some debate about the importance of +1. My opinion? +1 is a value that occurs in the real world and it's not always possible for the code that's producing the +1 to know what the output resolution is going to be. For example, a VST synth plugin has no way of knowing what the output resolution is going to be, so it can't be expected to know what to scale its output to. When converting from 24 bit to 16 bit and using float as an intermediary, there is no simple way to solve this problem.

On the other hand, non-pro A/D converters frequently clip around -.5 dBFS, which is below +1 - 1 LSB anyway. Conceivably, you could also correct for this by introducing a level shift at the output equal to 1/2 LSB, but that's equivalent to turning your converter into a (2^n)-.5 converter -- it solves one problem, but introduces another. All that said, there is no reason not to develop software, especially libraries, drivers and other software intended for use by multiple type of users including audiophiles and pro audio engineers, that is convenient to use while meeting the highest audio standards: just use the asymmetric converters.

Given the potential hazards found in mixing and matching conversion methods, I recommend that all libraries (and drivers, if possible) offer options for various conversion settings, both to minimize bit transparency problems and unnecessary quantization noise, until all libraries and drivers can standardize on the asymmetric conversion method. This is the only way to guarantee transparency and maximize linearity. As these results show, this issue may be more important than dither.

Int->Float->Int: It's a jungle out there!

2009-12-02T13:51:00.000-08:00

It turns out that the simple operation of converting from float to integer and back is not so simple. When it comes to audio, this operation should be done with care, and most programmers do, in fact, put a lot of thought into it. The problem most programmers observe is that audio, when stored (or processed) as an integer, is usually stored in what's called "two's complement" notation, which always gives us 1 more negative number than positive. When we process or store floating point numbers, we use a nominal range of -1 to +1.

The fact that there are more negative numbers than positive numbers has caused some confusion amongst programers, and a number of different conversion methods have been proposed. Here is my survey of how a number of existing software and hardware packages handle this conversion. In these examples, I show conversions for 16-bit integers, but they all extend in the obvious way to other bit depths. It is important to consider how these methods extend to larger integers, especially how they extend to 24-bit integers, so I've tested bit transparency for these methods up to 24-bit using single precision floating point intermediaries, correcting for the fact that IEEE allows for extended precisions to be used in computations. Endianness is irrelevant here, because everything works for big and little endian systems.

Transparency is only required or possible when the data has not been created synthetically or altered via DSP (including such simple operations as volume changes, mixing, etc). In cases where transparency is not possible, dither must be applied when converting to integer or reducing the resolution. In many software packages it is up to the end-user to make this determination and manually switch dither on or off. In my next post I will discuss dithering and linearity.

	Int to Float	Float to Int*	Transparency	Used By
0)	((integer + .5)/(0x7FFF+.5)	float*(0x7FFF+.5)-.5	Up to at least 24-bit	DC DAC Modeled
1)	(integer / 0x8000)	float * 0x8000	Up to at least 24-bit	Apple (Core Audio)¹, ALSA², MatLab², sndlib²
2)	(integer / 0x7FFF)	float * 0x7FFF	Up to at least 24-bit	Pulse Audio²
3)	(integer / 0x8000)	float * 0x7FFF	Non-transparent	PortAudio^1,², Jack², libsndfile^1,3
4)	(integer>0?integer/0x7FFF:integer/0x8000)	float>0?float0x7FFF:float0x8000	Up to at least 24-bit	At least one high end DSP and A/D/A manufacturer.^2,4 XO Wave 1.0.3.
5)	Uknown	float*(0x7FFF+.49999)	Unknown	ASIO²
*obviously, rounding or dithering may be required here. Note that in the case of IO APIs, drivers are often responsible for conversions. The conversions listed here are provided by the API.

Method 0 is one possible method for preserving the DC accuracy of a DAC, and is included here for reference.

Edited December 6, 2009: Fixed Method 3. (0x8000 and 0x7FFF were backwards)

Sources:
¹ Mailing list
² Perusing the source code (this, of course, is subject to mistakes due to following old, conditional or optional code)
³ libsndfile FAQ goes into detail about this.
⁴ Personal communication.

WAVE64 vs RF64 vs CAF

2009-11-11T10:46:00.000-08:00

Right now I am choosing new a default internal audio file format for XO Wave, and I'd like to choose a format that offers large file sizes and high resolution. I'd like to use an existing popular standard rather than inventing my own or using RAW audio. The pro audio industry is finally moving towards 64-bit file formats, and the three options supported by most pro software are

Wave64, aka Sony Wave64, originally developed by Sonic Foundry before 2003, is an open standard and a true 64-bit format: all 32-bit fields are replaced with 64-bit fields, and all chunks are 8-byte word aligned. Instead of the dreaded FourCC it uses GUID. Other than that, it is pretty much the same as WAV, so the spec is barely 4 pages long, although in my opinion it could stand to be a bit longer, as many aspects of WAV are so poorly devised it really wouldn't hurt for someone to put it all in one place. Some people have criticized the use of GUID on the grounds that there will never be that many chunks, but this misses the point: the point of using GUIDs is that anyone can define their own chunk without having to check with Sony or register a chunk ID. It's actually rather clever.
RF64 was proposed in 2005 by the EBU with full knowledge of Wave64. Although the proposal stated basic requirements that could have easily been met by a few minor extensions to Wave64, and they stated a desire to "join forces" with the developers of Wave64, they made no effort to do so other than to say they hoped they'd be involved. Moreover, the same document proposes RF64 as an alternative, incompatible 64-bit extension to the WAV format. Unlike Wave64, RF64 is not a true 64-bit format. All existing "chunks" remain 32-bit, so, for example, markers, regions and loops will no longer work past a certain number of samples. Even EBU's levl chunk will not work with RF64 because it uses a 32-bit address for pointing to the "peak-of-peaks" in the raw data. RF64 offers the much made-of promise of backwards compatibility via a "junk chunk", but, of course, this is possible with Wave64 as well, as pointed out in the Wave64 spec.
CAF, or Core Audio Format was Apple's entry into the ring. Apple didn't want to be left out of the 64-bit game, after all, and around the same time in 2005 they released CAF. Since they are Apple, they figured people would adopt it (Logic would, if no one else), even if there were competing specs. Their approach, however, was to start from scratch, and it's pretty refreshing. Indeed, the spec addresses practical issues to ensure that important features are implemented, and it even makes that tiny little bit of extra effort required to avoid file corruption by not requiring a header rewrite to finalize a recording of unknown length (Anyone who's ever recorded using software knows that once in a while something goes wrong and a file ends up corrupted. It's so nice that someone finally addressed this in a spec.).

The WAVE format is problematic in many, many ways. For example, in some places it uses zero-based indexing, in others it uses one-based indexing. Sometimes it uses signed integers for raw audio data, other times unsigned. That may not seem so bad, but considering how simple the data it's trying to carry is, but when you add to that the fact that Microsoft had to use format extensions just to clear up ambiguous documentation (and they've still got an ambiguously documented "fact" chunk), it's really not good territory. It is a shame that both Sonic Foundry/Sony and the EBU chose WAVE as the format to extend. Moreover, it's annoying that EBU designed their own, incompatible 64-bit extension to WAVE when a superior one already existed.

Some people think the whole "backwards compatibility" thing is a bunch of hooey because it puts an undo burden on the people writing the libraries. Erik de Castro Lopo, author of the popular LGPL'ed libsoundfile says:

Quite honestly, its stuff like this that makes me think the people who write these specs smoke crack!

If I were to follow the ... insane advice [about retaining backwards compatibility], the test suite would have to write > 4Gig files in order to write a real RF64 file instead of just a normal WAV file.

In order to avoid this insanity, libsndfile, when told to write an RF64 file does exactly as its told.

I would add that the backwards compatibility adds another point of failure in the recording process, in the same way that header rewrites are a point of failure in most current formats (except for CAF and "chunkless" formats like RAW and AU).

All that aside, RF64 is gaining some popularity and support -- probably more than Wave64. As for CAF, it's less popular, but since it's an Apple standard it's probably not going anywhere even if it's not going to be the "next big thing." It could be a fine place to work from, but just scanning the docs everything I looked at brought up a few issues that worried me. For example:

The CAFMarker data-type has three design flaws I noticed. One is that the frame position is a floating point number. I might be missing something here, but in a format where everything else that counts frames and bytes as 64-bit integers, why are we suddenly using floats? Sure that will be integral to pretty big numbers since it's 64-bit, but it's still a float. I didn't use a format like this to get pretty accurate big numbers when I could get completely accurate big numbers! Internally, most apps are going to be converting 64-bit integers to 64-bit floats, which is insane. Another problem is mChannel, which is the channel (starting at 1) that the marker refers to or zero if the marker refers to all channels. Okay, seems reasonable, except that the spec also defined a channel mapping with a 32-bit channel layout bitmask. Why not use that? Granted you might have more than 32-channels, but that's not going to be the most common case, and you could give your users a choice. Consistency is important in APIs. Also, let's face it, the CAFMarker, if not all the basic chunks, should be versioned and extensible. Sure all that takes a few more bits (well, not the float/integer thing), but it's really nothing compared to the sea of data in most audio files.
In the SMTPE timecode types they define kCAF_SMPTE_TimeType30Drop. Now, the fact is that there's really no such thing as 30 Drop, but I can see an argument for including it out of completeness. However, the documentation states that: "30 video frames per second, with video-frame-number counts adjusted to ensure that the timecode matches elapsed clock time." Which is wrong. If you actually had 30 Drop it would run ahead of elapsed, or "wall-clock" time. "Aha!" you say, "they really mean 29 Drop, which is often just called 30 Drop because everyone knows there's no such thing as 30 Drop." But, I'm afraid you are wrong, because there's another constant for that, kCAF_SMPTE_TimeType2997Drop, with pretty much the same documentation, only in this case, it's correct to say that the timecode matches elapsed time. (well, it's very close anyway)

So CAF might be flawed, but probably no more so than WAVE and anything built on it. The reliability factor is sweet. Really. The fact that many people, especially in broadcast, seem to be wanting RF64 support is a detraction, though.

Of course, I might just be over-engineering it. The AU format has been around forever, is super simple and provides high resolution, uncompressed audio of ANY length (it's not even limited to 64-bit). Of course, it lacks metadata which might be useful for BWF-style info as well as region data, but hey, it's wicked simple.

An interesting side note is that by choosing an appropriately sized junk/empty chunk in the header, Wave64, RF64 and CAF can actually be converted from one to another in-place.

Signs that your cat may have a drug or alcohol problem

2008-10-23T00:22:00.000-07:00

[caption id="attachment_152" align="alignright" width="158" caption="Adorable behavior such as this may be a sign of something more sinister!"][/caption]

If your cat is using alcohol and drugs, it's a good bet they're doing everything possible to keep that activity hidden. The last thing they want is for their owners to give them a "hassle" about his newly-found "entertainment." But continued alcohol and drug use will affect your cat's behavior, attitudes and even choice of friends.

Here are some signs to look for, if you think that your cat may be using:

Mood Swings

Most cats go through normal mood swings. But look for extreme changes -- one minute happy and giddy followed by withdrawal, or fits of clawing at the furniture more than usual. Is your cat both aloof and prone to acting out? These are possible signs of drug and alcohol abuse.
New Friends?

If you cat is using, chances are they will begin hanging out with other cats with similar interests. Has your cat suddenly turned away from their old friends? Are they hanging out with an older (driving age) group or with those that you suspect are using drugs?
Physical Health

Have you noticed a change in appetite? Is your cat extremely finicky about the food they eat? Does your your cat have strange or inexplicable sleeping patterns?
"Evidence"

Have you notice anything missing lately? Alcoholic beverages? Maybe some really neat sounding pill bottles?
Attitude

Has your cat suddenly developed a complete disregard for others? Does your cat get in your way or have a "bad attitude"? Is your cat evasive or troublesome, perhaps hiding when they know they've done something bad?
Little Things

Have you noticed a change in hairstyle or "fashion" choices? Are they suddenly using breath mints consistently? Is your cat suddenly very secretive?
Overt Signals

Has anyone ever told you your cat is drinking or using drugs? Do you know that they have "experimented." Have they suddenly developed the need for additional money, for vague or unexplained reasons? Have you ever seen them stagger? Or have you noticed any slurred speech? Changes in the pupils of their eyes, or redness or bloodshot eyes?
Again, many of these changes could be attributed to normal cat behavior. But if you have noticed a pattern of several of these "signals" your cat may be using alcohol or drugs.

thanks to alcoholism.about.com for.... virtually all of this text.

My Wikipedia Entry for Gossip Girl

2008-10-03T06:42:00.000-07:00

While everyone else is obsessed with the idea of teens getting it on (OMG! Teenagers having Sex! Can it be?) I couldn't help but notice something else about the show based on a school affiliated my own snotty-a#$@ed childhood alma-matter, so I wrote my own wiki entry...

Gossip Girl (TV Series)

In addition to lots of hot white characters, Gossip Girl also features a hot non-white character who plays Dan's BFF. (She even looks less white in other pictures!) If you believe she grew up in a scary part of Brooklyn, I'll sell you a bridge to that borough.

Gossip Girl is an American television teen drama based on the popular novel series of the same name written by Cecily von Ziegesar. Gossip Girl revolves around the lives of socialite young adults growing up on New York's least exciting neighborhood, the Upper East Side. The characters attend elite academic institutions while dealing with sex, drugs, jealousy, and other issues faced by today's rich white kids (oh, right, and Dan's BFF, what's-her-name¹. She has to deal with pretending not to fit in, despite being from scary Brooklyn). Issues not addressed in detail include: depression, college admission, school work, money, medical problems, emotional problems, etc. (fortunately, that TV unfriendly stuff doesn't happen to these kids, even, surprisingly, anorexia).

Despite the appearance of ~~non-wasp characters~~ a non-wasp character, the audience is able to fantasize themselves into an all-white world (just like Groton's founder Endicott Peabody dreamed of, too!) because that character at least has blue eyes and is really hot and has a "normal" job working in a cafe (off-camera she's a model).

Apparently, the writers attended New York City private schools either in 1947, or in the South, because several of the episodes depict things like coming out cotillions (I know the characters are too old for bar/bat mitzvahs, but cotillions? WTF? How about the Gold and Silver Ball next season?).²

Anyway relax, this show is totally race-sanitized: no major Asian characters or Black characters (unless they have blue eyes, exotic sounding names and can pass for children of characters from the '90's hit show Friends), and definitely no Jews. Not the Upper East Side I remember from my high school days, but it's definitely the Upper East Side many of my friends' parents probably dreamed about. It's sort of like a eugenics project, only it's not real, so you can feel good about that, too, because some of those schools really are eugenics projects and that's more than a little creepy.

Criticism

The show's critics allege that the show raises some adult themes. What exactly constitutes an adult theme? According to Bill Waterson, it's things like doing paperwork, filing, etc. Indeed, the show has that: Nate becomes involved in the family business and so on, so I advise parents caution when allowing their children to watch this show.

Season 2

Rumor has it that more adult themes are in store next season. Perhaps they'll be doing taxes, or learning right from wrong or even drinking adult drinks like cosmos? Perhaps the smarter ones will grow up properly and turn into chicktellectuals.

Discussion Questions

Wasn't there an Asian chick with no lines in the pilot?

What's more boring, a blog about a TV show or a TV show about a blog?

If you were going to have a bitchy argument synchronized to a song, which song would it be?

In what way is Jessica Szohr's character not white?³

Notes

¹ It's hard to remember the character's name when she's missing from more episodes than any other character. I wonder why that is.
² Okay, truth be told, there were a few cotillions, but nobody got excited about them except the young ladies' mothers.
³ We're looking for either everyone talks about how great she is in order to feel accepting of her, or she provides advice and, you might say, spiritual guidance the ~~real~~ other characters.

Input Lists (Corrections)

2008-03-24T01:35:00.000-07:00

In Part II of my series on Stage Plot input lists I got a few of my facts wrong. Not about input lists, but about the band I picked on, Kansas City Funk Syndicate. It wasn't because David Freeland, who had originally contacted me with some excellent questions about his band's input, was unclear, it was just that so much time had passed between asking his questions and me actually blogging about them. I forgot some important facts and got confused, so here are some clarifications.

One thing I wrote was that Funk Syndicate should buy a new mixer. This would have been good advice had my version of reality been the right one. David clarified that their mixer can reorder inputs -- the reason their input list seemed "out of order" was because of "ergonomics on the [mixer] and the money channels since some of our preamps are better than others at the moment. A lot of the channel moving is all of the relabeling more that the capabilities of the mixer." In short, they could have changed the channel order but it takes time to relabel their mixer and their outboard gear. (I am also guessing that by ergonomics they mean that they wanted the important stuff on one "page" of their mixer, which only has 16 physical faders, but at least 32 physical inputs). Makes perfect sense.

Still, it presents a problem for the FoH mixer who isn't used to finding most vocals at one end of the mix and one lonely vocal at the other. Confound that with something that I am guessing happened: Funk Syndicate sent their stage plot and Input List to the venue explained everything, including their unusual input list, the venue said it would be no problem, but failed to communicate it to the mix engineer in advance and the mix engineer got caught by surprise but Funk Syndicate thought he knew what the deal was going to be. I'm just guessing, but that kind of thing happens all the time.

I don't have a great solution to the issue without actually seeing the whole rig, and knowing all the concerns, but here's what I suggested to David:

If it were me, the first thing I would consider would be simplifying the rig -- sometimes less is more. Personally, I have never heard the difference between high quality and very high quality preamps in a live situation, but you guys might be hearing it -- especially with in-ears, or if the preamps are actually full channel strips or you've got other outboard. Or just give in and lose a little of that on your money channels. Labeling can also be an issue. You could try two colors of sharpie (one for each setup).

I don't know if this is workable -- every band is different and sometimes it can be difficult, emotionally, to have spent a hard-earned cash on high-end gear and not use it because it doesn't fit ergonomically into the rig. Something else to consider might be to give your venue two input list options, and let them choose. This could backfire because you are giving them choices they are not used to, so you'd have to find a concise way to explain the difference. You'd also want to be able to switch between them easily, preferably on site because the odds of the venue actually passing the question on to the FoH engineer and getting back to you with his or her answer is small.

* * *

Well, David has a tricky situation, no doubt, and I'm not sure I've done much to solve the problem because there is no easy answer. If you find yourself in a similar situation, remember to be as communicative, open and clear as you can be and remember to anticipate problems and be ready for them. A weird input list is one potential problem, so if you have a good reason for not following the usual rules, make sure to communicate those reasons, and, if you can, have a backup plan!

Input Lists (The Other Part of Your Stage Plot) Part II

2008-03-17T02:01:00.000-07:00

In part I of this post, I covered the basics of input lists. I noted in that article that most people don't need to worry too much about the actual numbers they assign to their inputs in their input lists (aka the channel numbers), because most engineers won't pay them much attention. I recently got an email from David Freeland of the Kansas City Funk Syndicate who had a question about the channel numbers on his input lists, and I'll address that in this posting, but many readers may simply want to read part I, since channel numbers are not relevant to most bands.

NOTE: while this post still stands on it's own, it turns out I got some facts wrong about KC Funk Syndicate. For corrections, see Input Lists (Corrections)

Why your channel numbers generally don't matter

In general, you might think it would be easy for a mix engineer to adhere to your request for channel numbers. In reality, different mixer configurations and the desire to have continuity between sets (especially at festivals) makes this difficult. For example, a typical small format mixer looks like this:

but most large format live consoles don't look like that, because there are too many channels and you want the important stuff (the mains and the busses) in the middle where you have the best access to them. So, large format live consoles look like this:

The upshot is that building from left to right (which is how you setup your input list) is not the only, obvious way to go. On the big mixer pictured above, for example, I might put my lead instruments and auxes on the first 24 channels (which are to the left of the main section), and the drums, bass, and other rhythm instruments on the next 24 channels, which are on channels 25-48, which wouldn't make any sense at all except to someone who was looking at the same mixer. Another way to go might be to alternate bands on the left and right sides of the mixer, which might seem strange, but I heard of one festival which had a rotating stage, and they might want to do that to swap out bands faster -- especially if they had two engineers sharing the same console!

Don't even think about trying to organize your input list in anticipation of this, though, because you won't know where the split on the mixer occurs, how your engineer likes to split things up, or what the other acts will be. Even if you did, your input list would cease to be a logically grouped, organized reminder of the important sound sources of your band, which is the whole point of the input list.

KC Funk Syndicate's Stage Plot

Let's, take a look at KC Funk Syndicate's Stage Plot (Contact Info is Blacked out):

	Open in this window
	Open in new window

The first thing you'll notice, aside from the fact that they've got a very large band, with a very well organized stage plot is that their input list is out of order. Their vocalists are at the top (good!), except for one, Karl, who is close to the bottom (bad!). There's also some channels that are not strictly inputs, but rather talkback channels and so on (confusing!).

I asked David about this and he explained that they really wanted the channel numbers assigned to the numbers he gave them because they were supplying their own monitor system and engineer, and the venue (a large casino) was supplying FoH system and engineer. Now, I'm sure the FoH engineer asked, as I did, why the channel order had to be that way. Couldn't the monitor engineer use a more sensible channel order? Apparently, the answer is no, because the monitor engineer was using a pre-programmed digital mixer who's tracks could not be reordered. That stray vocalist had been added to their act recently, and so had to be added to the end because the digital mixer was incapable of reordering their the tracks into a more sane order!

I don't know how much of the venue's irritation stemmed from not being able to call the shots (FoH engineers usually call the shots on things like channel numbers), and how much stemmed from actual confusion about having the vocals being separate, but David told me the venue was definitely upset.

Given this limitation, David did the right thing presenting an input list with the channel numbers assigned in order, and he was able to communicate the situation very clearly to me, so I presume he did so with the venue as well. That said, I agree with the venue, that David should consider other options in the future.

What's the Solution?

Since David upset the venue, he did some research, came across some of my previous posts on stage plots and asked for my advice. Before I get to that, I should say that if Funk Syndicate were the only band on the bill, and the FoH engineer was advised of the situation in advance and they had plenty of setup time, the FoH engineer should have been able to deal with the unusual setup. It's not ideal, but it's hardly an impossible situation and it's all clearly documented. Oddities like this are life sometimes, and David is clearly open to suggestions. I don't know if the FoH engineer was being a jerk or simply making a recommendation that they change their setup -- all David said was that the venue, "gave him some grief." I hope it was the latter.

Clearly, the ideal situation for Funk Syndicate would be to ditch the mixer that can't cope with changing channels. Seriously, I love digital gear, but it has to suit your purpose, and in this case, the centerpiece of a live setup is so inflexible that it can't be made to play well with other gear, so it should be dumped. I realize it was a big investment, and Funk Syndicate may not need to go to this extreme for what is, in fact, little more than a nuisance, but it is the right solution. E-bay will help recoup the loss. Although this is the best solution, keep in mind that even with a mixer that can swap channels you might still run into trouble -- what if FoH (for whatever reason) needs to rout you into channels 48-56 and your mixer only has 36 channels. In this (admittedly very unusual) case, you will not get your channels to line up.

Short of that, I do recommend reprogramming the mixer so that the input list is more in keeping with what FoH engineers are used to. Since Funk Syndicate are probably usually the only act on the bill, probably have plenty of prep time, and do a long set (I am guessing here) many of the usual "don't tie yourself down to one set of channels" argument won't apply, but I think they will continue to rub FoH engineers the wrong way if they continue with their current setup. I would definitely recommend, at the very least, reprogramming their board so that they could use the following input list (or something like it). Notice that I've left 2 channels open so they've got some room to grow. Maybe they should leave even more channels open depending on their mixer and snake configuration, but this way they are not expecting their venue to have more than 24 channels.

chan	Input	Notes
24	Frank Vox
23	Neil Vox
22	Kim Vox	Wireless
21	Kevin Vox
20	David Vox
19	Bob Vox
18	Karl Vox	Wireless
17	Open
16	Trumpet	Wireless
15	Sax	Wireless
13-14	EWI	Stereo
12	Guitar
10-11	Keys	Stereo DI
9	Open
8	Bass	DI
7	Overhead	Single
4-6	Toms	3
3	Hi Hat
2	Snare
1	Kick
(1-2)	(FoH)	Return
(3)	(Stage Talkback)	Return

Note that I've left returns on there, so that the venues know they are needed, but I've separated them from the rest of the mix. If they wanted to use them as inputs to to monitors I would recommend patching them in as auxes, or as much higher channels (not otherwise assigned), rather than assigned channels, because the FoH guy won't want to see his own outputs coming in as input channels.

Again, if Funk Syndicate is in a situation where FoH can't conform, then this won't solve their problems, but it should do in well over 90% of cases and it should keep most FoH engineers pretty happy.

A final solution is to do nothing and simply be extremely humble when facing FoH engineers. This will leave David facing some irritation, frequent, though slight delays in setup (since things are not where people are used to seeing them) and maybe occasional minor mistakes, but probably never anything worse than that.

So, while there's no perfect solution, there are plenty of workable options. David will have to be aware of the compromises -- irritating FoH mixers a bit vs reprogramming his mixer vs suffering with mismatched channels in his monitor and FoH mixes vs dropping a lot of cash on a new mixer. Even with a new mixer, there may be a rare case of channel mismatch, so nothing is perfect in this world.

Input Lists (The Other Part of Your Stage Plot) Part I

2008-02-21T01:18:00.000-08:00

Previously I've posted a lot about stageplots. You'd think I'd've said it all, but the truth is I've glossed over something important: the input list. I recently got an email from David Freeland of the Kansas City Funk Syndicate who had a question about input lists. I'll get to his question in part II, and, today, focus on the basic facts of input lists, which are really quite simple.

Previous articles on Stageplots:

What you need to know about playing a festival, which alluded to stageplots.

Stage plots and Input Lists (Updated 9/21/06!): what they are, why you need one, and how to make one

Creating Stageplots on your Mac with OmniGraffle

You might be tempted to think input lists are not that important. After all, if you're a huge touring act, you're probably touring with your own engineers who know your inputs, and if you're five-piece local act, house engineers might not even bother looking at your stage plots much less your input lists most nights. The truth is that unless you are a solo acoustic act it's very unlikely that any engineer is going to keep a complete input list in their heads and having it written down is the way to go. If you are ever asked for an input list, that means someone is trying to help you, so you should try and give them the best input list you can, so they can help you as best they can. Making good input list is ridiculously easy, and, as David's experience has taught him, having a good input list can save you some grief (actually, there wasn't much wrong with David's input list, and it still caused him some grief!). So let's just take a few seconds right now to get this right!

We've got a stage plot, why an input list?

Your stage plot clearly shows all your instruments, and should show how you want them connected to the sound system (via DI or mike -- you did do that, right?), so why do you also need an input list? There are a few reasons. The most common reason is simply a check list of all instruments so that the engineers can make sure they've got signal for everything. It can also be a place for additional notes or reminders. This is especially true if you've got a huge act and you simply can't put all the information you need on your stage plot. It's also a good chance to put the important stuff up top to help your engineers focus on what's import -- eg. lead vocals first! Since most mix engineers build their mix from the "bottom up" you might want to number your list backwards, like so, but we'll see in a moment why that probably doesn't matter:

chan	input	notes
7	Vocals	Wireless Mike (We can provide)
6	Acoustic Guitar	DI
5	Bass	DI
3-4	Drum Overhead
2	Snare
1	Kick

Notice how the input list is ordered in the same way that the mixer might be set up -- so that the mix can be built from the "bottom up", and the "money channel" lead vocals, are at the top, where they are least likely to be missed. Notice, also, that stereo sources get two channels -- you'll do the same thing with stereo keyboards and so on.

For most acts, that's all there is to it. Really. You are done. It's that easy and it will make your show go that much more smoothly.

But, you say, what if, say, another act goes on first and uses channels 5 and 6 for toms, but is otherwise the same? What if the house mixer has inputs 1-16 reserved for some other purpose, like a multitrack feed? What if the the mixer is some weird configuration or the mix engineer like to put the vocals on the low number channels? What does the house engineer do? The answer is, they ignore your channel numbers. Completely. And that's okay, because most of the time you're channel numbers don't matter, and unless your channel really do numbers matter for some reason, you can stop reading.

If you think your channel numbers really do matter, part II is coming up.

Creating Stageplots on your Mac with OmniGraffle

2007-12-16T22:53:00.000-08:00

A while back I posted about stage plots and why they are so important, especially at festival gigs, but I did not go into much detail about how to make one, other than to say draw one by hand if you lack the technical skills. Of course, there's as many ways to make them as there are people making them, but I've been using a program called OmniGraffle for a while, and it occurred to me that it would be a great tool for making stage plots if you're a mac user. It's extremely intuitive once you spend fifteen minutes with it and if you've got the right stencils, it's a breeze.

First download and install OmniGraffle. You won't need to buy it to try it out. You may not even need to buy it at all if your stage plots are simple, since it lets you add up to 20 objects before you have to pay. After you've downloaded Omnigraffle and tried it out a bit, download my stencils, which include an amp stencil, an instrument stencil, and an general purpose stage plot stencil.

After downloading it, unzip the file, and copy the stencils into ~/Library/Application Support/OmniGraffle/Stencils/. Next time you start OmniGraffle, it will see the new stencils and you can grab the images and lay them out on a document. When you're done, save the document and use the export feature to save as a jpeg, or pdf. I threw some instruments together and came up with this in no time flat:

(click image to enlarge)

Beautiful it a'int, but clear it is, especially considering all the stuff on stage. For details on what should really go into your stage plot, remember to see my blog posting on that subject, but you get the idea. If you have an instrument not covered (I only did the basics), you can either use a box and label it or find a picture of it online.

What you need to know about playing a festival

2006-10-09T20:31:00.000-07:00

I love stage managing festivals. But a lot of bands, even ones who have been playing in clubs for years, don't know how to prepare for festival shows, which are very different from regular nightclub gigs. With the Candler Park Fall Fest coming up, I thought it would be good to write an article on the subject of preparing to play for a festival.

Got stage plot?

I've already talked about the importance of stage plots, especially at festivals in a previous posting. The gist is this: stage plots help a lot so you should have one. Note: If your looking for some software to make a stageplot, I've added an entry about using something called omnigraffle to make stageplots on your mac.

The Schedule

Staying on schedule is the job of the stage manager, but it's your job to work with them to do so. If you are late, or take time getting setup on stage, or the stage manager expects trouble getting you off stage, or there was a problem earlier and the stage manager needs to makeup time, your set may be cut short. It may or may not be fair, but it's always important to get off the stage when asked. I have never had to force a band offstage, but I'm sorry to admit that I have had to ask them to cut their sets a bit short on a few occasions. This was usually because they arrived late, or without a stage plot (or both!), but it does happen that I accidentally gave one band too much time, and had to cut a little time from another band. If this happens to you, it sucks. It really sucks. But it happens, and it's good to be prepared. For example, if you've got a song or two you want to finnish with, it's a good idea to keep track of time yourself (or ask for updates while on stage) so that you can skip a less important song.

If you feel that you behaved professionally and the stage manager cut your set short unfairly, especially by more than five or ten minutes, bring it up with them another time after the show or tell the booking agent or event director. You might not get preferential treatment next time, or extra money, or even an apology, if the stage manager/event planner thinks they made the right decision, but at least they'll know how you feel and it will help them to do a better job next time.

Forget the sound check

Most festivals try to cram as much music in as possible. This, combined with other considerations (like neighborhood noise laws), typically means there's no time for sound checks, so it's important to do what you can to help your sound crew prepare. Most sound crews can do a fine job even without a sound check, and, as you play, they'll tweak things very quickly so that you'll be sounding great long before your first song is over. However, it's always good to check in with them before you go on so they know what to expect and can set things up for you from the get-go. The most important thing you can do is provide a stage plot, but talking to them and describing your sound and what's most important to you about your sound helps a great deal.

Meet your monitor mixer

The sound on stage can be quite good at a festival, especially outdoors, where the lack of walls and ceiling may help reduce feedback. But remember how you didn't get to do a sound check? That means it may not sound perfect on stage from the first song. At a well run festival with a large stage, you'll typically have separate engineers for FoH (Front of House, which is what the audience hears) and monitors (which is what you hear). Make sure to introduce yourself to your monitor engineer before the show, and tell them what each musician needs to hear onstage. Typically, they can guess some things, like that the vocalists need to hear themselves, but it's good to check in with them either way. If you get onstage and you're not hearing what you need to hear, alert the monitor mixer or stage manager. Most systems allow the monitor mixer to listen to what is going through each monitor speaker without having to go onstage to listen, but they might not know a problem exists on a certain monitor speaker without you telling them. Alerting them to the trouble really can make a difference (just be sure to tell them nicely, since the monitor mixer is your bands best friend while you're on stage).

As with performing in a venue, the sound onstage can vary drastically from the house sound. That's nothing to worry about but it may be disorienting. Many singers have a hard time hearing their vocals with little to no reverb, for example, but reverb can cause feedback problems, and the monitor mixer usually has fewer effects at his or her disposal, so it's best to do without if at all possible.

Share equipment

Many bands, especially those with younger members, are adamant about using their own equipment, even if the festival provides equipment for them. This is understandable, especially if the band has a very particular sound that requires special equipment. However consider this:

It is usually harder to get equipment in and out of a festival setting. Festivals often take place in settings that were not designed with large equipment in mind.

Any time you spend getting your equipment on or off stage is time taken away from your set. (especially if I'm stage managing the event)

Finally, I'll let you in a on a little secret: outdoor sound reenforcement sucks. Yes, you heard it here first: doing outdoor sound is hard. Doing a good job without a soud-check or an engineer who has heard your band before is hard. Setting up your equipment correctly while the previous band is striking and doing it all in 15 or 20 minutes (10 minutes on my stage) is hard. Even with the right equipment, the end result can be mediocre compared to virtually any indoor setting, and very often the right equipment is not available. So be sure to ask yourself this simple question: do you really want to spend all that time getting the "perfect sound" that no one's going to hear anyway, or do you want to have time to play an extra song or two?

I don't mean to say that there's never a good reason to use your own equipment, just that you may have to make a choice between hauling gear, and playing more songs. The choice is yours and for many bands getting the right sound is both legitimate and necessary. Just be sure not to make a decision that cuts into your set time unless you have to.

Expect the Unexpected

Things usually go well if everyone is prepared. Sometimes things go well even if someone is not prepared. But in a festival, it's not uncommon for things to go wrong. With much of the work being done by volunteers, and without the common luxuries of sound-checks and familiar stages and equipment, someone is bound to be caught off-guard at some point. With all the adrenalin and exhaustion of everyone there, it's easy to get in a tizzy about this and make more mistakes (maybe I could write another entry just about that), so it's especially important to remember that everyone is there to have fun, and most people, especially the audience, are quite forgiving.

With the right preparation, though, serious problems are almost always avoidable, so do your homework (make a stageplot!) and then relax and break a leg!

Stage plots and Input Lists (Updated 9/21/06!): what they are, why you need one, and how to make one

2006-09-14T00:54:00.000-07:00

Note: If your looking for some software to make a stageplot, I've added an entry about using something called omnigraffle to make stageplots on your mac.

Occasionally, I get have the time to work on a music event. It doesn't happen often, but when it does, I have the pleasure of working with bands and helping them do a great show. For the past three years, now, I've been stage manager for the Candler Park Fall Fest, and it's a great experience. A lot of other festivals are not well run -- bands just show up at the specified time and try to make due with what's there. They're often lucky if there's more than one sound engineer. This can be a frustrating experience for everyone but especially for the bands, so when I work on an event I try to do everything I can to run the stage as professionally as possible, but it also takes a little cooperation from bands.

In order to make a festival run well, a lot of things need to come together, and one of the most important parts of that is getting accurate stage plots and input lists for all the bands. Most professional bands have a stage plot as part of their contract rider. Even if they do their own stage setup and have lots of time to sound-check, a stage plot helps ensure in advance that the necessary equipment will be provided. When it comes to doing festivals, though, many luxuries, such as sound-checks, and a nice long break between acts, simply aren't there, and the stage plot helps ensure that things go smoothly, because, with a stage plot, everybody knows exactly what needs to be done.

I do all my advance planning around the stage plots, making sure that the required equipment is available and that any tricky band transitions are worked out in advance. When the festival starts, I plan the whole day around my stage-plots, and I use them to coordinate with the bands, the stage crew, and both the monitor and FoH (Front of House) positions to ensure that everything is setup, miked and amplified properly and that every band is ready to go on as quickly as possible. Towards the end of each set, I go over the next act's stage plot again, so I can give instructions to the stage crew and prepare for the transitions. If there is anything tricky or unusual, I usually oversee it myself during the transition. With a good stage plot, I can give instructions and help the bands and crew setup.

Without a stage plot we have to figure it out as we go along. With a stage plot, I can accurately assess how long each transition will be, and time the sets so that each band can play as long as possible without getting off schedule. Without a stage plot, I have to assume that it's going to take a long time, so everyone gets shorter sets. (I am a real stickler for the schedule because if you go off schedule, someone ends up at the short end of the stick.)

You don't get more time in your set because it took you a long time to setup, so providing a stage plot and input list means you get a longer set. Last year's Fall Fest had bands that were prepared play for about 50 minutes, while unprepared bands were forced to cut their sets down to as short as a half hour. It makes that much difference.

How to Make a Stage Plot

One thing I've learned, is that many bands don't have a stage-plot or input list, an many don't even know what they are. That's fine if you're just playing the local pub. It's even fine if your playing big venues an have lots of prep time. It's not fine if your playing a festival.

Fortunately, making a stage plot is easy. All you need to do is draw a birds-eye view of the stage with everything important, such as musician locations, mikes and DI boxes. Mike stands should be labeled boom or straight. Mike types should be specified if they are non-standard. Monitor locations and any other on-stage requirements should be specified. Top it off with your band name, date and contact info and you're done. Don't try and get too fancy or show off your art skills, that only makes it harder to read. Remember this is not promotional material -- it's technical information.

Put your band name and contact info at the top. It's also good to date the plots so that you can ensure that everyone has the current one. Below that is the actual diagram. Most stage plots have the band facing down so the audience is at the bottom of the page. I prefer this view as it puts down-stage down and up-stage up and house left on the left and house-right on the right, but the upside-down version of this is acceptable, too. Here are some examples of good stage plots, some of which include input lists, which we'll get to in a minute:

Johnny Knox and Hi-Test have an almost perfect stage plot. I've blacked out the contact info, but otherwise, this is their plot. Everything that's important to them is here -- locations of equipment, types of stands, even info such as reverb settings are here. Unusual things, such as no DI on the Bass, are clearly marked. There are 2 minor problems: 1. it should have a date, so everyone knows this is the current stage plot, and 2. the stage plot is upside-down from my preferred orientation. Still, this is a great stage plot, and can easily be used to get them on stage and sounding great fast and with minimal hassle.
Papa Grows Funk have a great stage plot. It clearly shows everything a stage manager and stage crew need to know, including power and specific mike requirements. It's probably overkill for a festival, where 9 mikes probably can't be provided for the drums, but that doesn't matter because the critical info is there.

Rhythm City's Stage plot is decent, though it could stand to have a key to assure the person looking at it that the box with a circle in it is a monitor amp. It would also be nice if the stage plot indicated clearly weather each stand was straight or boom -- as is the stage crew would probably assume straight unless it says otherwise, or they might assume a boom stand is okay, which it might not be. It would also be great to have musician's names and numbers corresponding to numbers in the key, below, so there was no confusion. Finally, a band can request as many 57s and 58s as they like, but they should not expect to get 4 beta 58s, as this band does.

If you don't have software or software skills to make a stage plot on the computer, there is nothing wrong with doing it by hand and scanning or photocopying it. Just make sure to write clearly and legibly. Remember this is not promotional material, so just stick to the facts, and keep it simple and clear.

Stage Plot File Formats

Whatever program you use to make your stage plots, from photoshop, to Microsoft Word or power point, or even boutique software such as Omnigraffle, it's important to send it in a format that can be read by the stage manager. I have Office for Mac, but it doesn't always read e-mailed PC word files correctly. In fact, you shouldn't count on your end user having Office or any other program. Instead, send it in a format specifically designed to be portable and easy to read. You can send a PDF, JPEG, GIF, PNG or anything else commonly used on the web. If all else fails, print it out and mail it.

Input Lists

An input list simply lists every mike, DI Box, synth rack or other sound source on your stage, whether you are providing it or the sound company is providing it. You want to say what the instrument is, and how sound is to get from the instrument to the mixers. For example, a solo guitarist/singer might provide an input list like this:

Channel	Instrument	Input Method	Notes
1	Male Vocals	Mike (SM 58 or equiv.)	Very dynamic, but good mike technique. Please use just a touch of reverb. Stereo small hall preferred.
2-3	Guitar	DI	I have two effects pedals. Stereo out. Please add no additional effects except compression and EQ as needed. Should be bright but not overpower the vocals.

If it's a long list, it's good to group them in the way that a mix engineer is most likely to group them. For example, you might start with drums, percussion and bass, and move up to rhythm guitar, keys, lead guitar, and then vocals. Better yet, do it in the reverse order so drums and rhythm are at the bottom and lead instruments like the voice are at the top. It's a good idea to highlight or mark the important things: the notes sections in the table above is probably overkill in this department, but it doesn't hurt.

If your band has a great lead guitarist or the vocals really are your band's "money channel", say so on the input list. That way, the mix engineer knows to pay special attention to those instruments. Most professional mix engineers do read this stuff, so if something is important, write it down!

Other stuff

Finally, if your really want to sound your best, there are a few other things you can provide your stage manager with:

A Set List: I may not know the songs, but if you make notes about solos, false endings, and song dynamics, the mix engineer will be prepared and be able to make you sound that much better. I rarely receive these, but when I do, I tend to get more compliments on the sound, so I know it makes a real difference.

Description of Monitor Mixes: We know the lead vocalist needs to hear themselves as loud as possible, so we don't need to know that, but if you need something unusual, like extra keyboards in the drum monitor, it's a good idea to have that written down somewhere.

Photos: It's great to have pictures of all your band members so no one asks them to leave if they are found hanging out backstage. You should ask your band members to introduce themselves to the stage managers and monitor mixer as soon as they arrive. That's also a good time to discuss any last minute changes or go over your requirements.

Break a Leg!

That's it. With this information you are now armed and ready to play at my festivals and probably anybody else's. If you are not asked well in advance for this information, or your get a blank stare when you provide it, it probably means you can expect trouble when it comes time to perform, so be prepared. Now, go break a leg!

Johnny Knox and High Test Stage Plot

2006-09-14T00:23:00.000-07:00