Wednesday, November 20, 2013

Solving acoustics problems

A "waterfall plot" like this one is one of many tools used by
acousticians to determine the problems with a room.
Photo from realtraps which provides high quality bass traps,
an important type of acoustic treatment.
I recently received the following letter (edited):
      

Greetings,

The echo in my local church is really bad.  I am lucky if I can understand 10% of what’s being said.   I have checked with other members of the congregation and without exception they all have the same problem.

The church is medium size with high vaulted ceiling, very large windows with pillars spaced throughout.  The floor is mostly wood.   The speakers are flat against the side walls, spaced approx 15 metres apart and approx 10 feet above the floor.

The speakers are apparently ‘top of the range’… I just wonder if a graphic equalizer was used between the microphone and speaker, would this ‘clean up’ the sound a little?

I know that lining the walls with acoustic tiles and carpeting the floor would lessen the echo, but, we don’t want to do that if we can avoid it.

With regard to putting carpet on the floor, my thoughts are that instead of sound being absorbed by the carpet, the congregation present would absorb just as much as the carpet?.  One other theory I have is regarding the speakers.

If  the speakers were moved…

Michael

Hey Michael,

I sympathize with you. Going to service every week and not being able to understand what is being said must be very frustrating. While this is not the kind of thing I do every day, I do have some training  in this area and will do my best to give you something helpful.

Most churches are built with little attention to acoustics and old churches were built before there was any understanding of what acoustics is. With all those reflective surfaces and no care taken to prevent the acoustic problems that they create, problems are inevitable, and sometimes, such as in your church, they are simply out of hand. In a situation like that, even a great sound-system won't be able to solve the problem.

I recommend you hire a professional in your area to come look at the space and be able to give some more specific feedback. To have them improve the situation may cost anywhere from hundreds to tens of thousands of dollars (or even more) depending on the cause of problem. However, it's helpful to have some idea of what some of the solutions are so that when you hire that professional you are prepared for what's to come. You might be able to do some more research and take a stab at solving these issues yourself.

For example, it might be useful to listen to room and conjecture, even without measurements, if the problem is bound to specific frequencies or if it's just a problem of too many echos. If you are a trained listener you might be able to stand in the room in various places, clap loudly and listen to get a sense of this. Although even a trained listener would never substitute such methods for actual measurements, I often find this method useful for developing a hypothesis (eg. I might listen and say "I believe there is a problem in the low frequencies" before measuring. Then use measurements to confirm or reject this hypothesis). Also, look at the room, are there lots of parallel walls? If so, you are likely suffering from problems at specific frequencies and it's possible that a targeted, and probably less expensive, approach will help.

Another thing you can do is find someone with some stage acting experience and have them speak loud and clear at the pulpit. Have them do this both with and without the sound system and listen to the results. If they sound much clearer without the sound-system than with the sound-system, then that suggests that your sound-system may be causing at least some of the problems.

If you can't afford an acoustician, but you are willing to experiment a bit, this kind of testing might lead you to something. For example, maybe you notice some large open parallel walls and you agree that covering one or both of them with some heavy draperies is either acceptable or would look nice. You could try it and see if it helps. It's no guarantee, but it might make a difference. Draperies are, of course, unlikely to make that much difference by themselves, so you might consider putting acoustic absorbing material behind them.

Be warned, however, that acoustic treatments done by amateurs without measurements are often beset with problems. For example, you may reduce the overall reverberation time, but leave lots of long echos at certain frequencies. This can be yield results that are no better than where you started -- possibly even worse (although in your case I think that's unlikely).

Here are the types of things a professional is likely to recommend. You've already alluded to all of them, but I'll repeat them with some more detail. I put them roughly in order of how likely they are to help, but it does depend on your specific situation:
  • Acoustic treatments. Churches like the one you describe are notorious for highly reflective surfaces like stone and glass, and as you surmised, adding absorptive materials to the walls, floors and ceiling will reduce the echo significantly. Also as you surmised, floor covering may be of limited effectiveness since people do also absorb and diffuse sound, but, of course, it depends on how much of the floor they cover and where. I understand your hesitation to go this route since it may impact the aesthetics of the church, and it may be expensive, but, as I mentioned above, depending on the specific situation, you may be able to achieve a dramatic result in acoustics with relatively little visual impact, and depending on the treatment needed you may be able to keep your costs controlled. You should also be able to collaborate with someone who can create acoustic treatments that are either not noticeable or enhance the esthetics of your space. (Of course, you'll also need someone familiar with things like local fire codes!)
  • Adjusting the speakers. It's certainly possible that putting the speakers in another location would help. If they were hung by a contractor or someone who did not take acoustics into account, they are likely to be placed poorly. Location matters more than the quality of the speakers themselves. Also, if the speakers are not in one cluster at the front, adding the appropriate delay to each set of speaker may help to ensure that sound arrives "coherently" from all speakers, which can improve intelligibility significantly. Devices to provide this kind of delay, and lots of other features, are sold under various names such as "speaker processors," and "speaker array controllers," etc.
  • Electronic tools. Although this is likely to be least effective, you can usually achieve some improvement with EQ, as you suggested. For permanent installations, I prefer parametric EQs, but a high quality graphic will also work. An ad-hoc technique for setting the EQ is to increase the gain until you hear feedback, and then notch out the EQ frequency that causes the feedback. Continue increasing the gain until you are happy with the results. You must be very careful to protect your speakers and your hearing when using this technique, both of which can be easily damaged if you don't know what you are doing. Most speaker processors have built-in parametric EQs and some even come with a calibrated mike that you can use with the device to adjust the settings for you automatically. I've done this, and it works great, especially with a little manual tweaking, but you do have to know what you are doing. But, of course, you can't work miracles in a bad room.

Saturday, September 21, 2013

Mapping Parameters


Visualizing a Linear Mapping
Very often we need to "map" one set of values to another. For example, if we have a slider that ranges from 0 to 1, and we want to use it to control the value of a frequency setting. Or perhaps we have the output of a sine wave (which ranges from -1 to 1) and we want to use that to control the intensity of a EQ. In these cases and many more, we can use a linear mapping to get from one range of values to another.

A linear mapping is simply a linear equation, such as y = mx + b, that takes an input, your slider value for example, and gives you back an output. The input is x, and the output is y. The trick is to find the values of m and b.

Let's take a concrete example. Let's say you have the output of a sine wave (say from an LFO) that oscillates between -1 and 1. Now we want to use those values to control a frequency setting from 200 to 2000. In this case, x from the equation above represents the oscillator, and y represents the frequency setting.

We know two things: we want x=-1 to map to y=200, and x=1 to map to y=2000. Since our original equation, y = mx + b, had two unknowns (m and b), we can solve it:

Original equation with both unknowns:
y = mx + b

Substituting our known values for x and y:
200 = (-1)m + b
2000 = (1)m + b

Solving for b:
2200 = 2b
1100 = b

Solving for m:
2000 = m + 1100
900 = m

Final equation:
y = 900x + 1100

You can check the final equation by substituting -1 and 1 for x and making sure you get 200 and 2000 respectively for y.

So in our LFO/frequency example, we would take our LFO value, say .75, and use that as x. Then plug that value into the formula (y=900(.75) + 1100=1775) and get our final value for our frequency setting.

Sunday, July 21, 2013

Peak Meters, dBFS and Headroom

The level meter from audiofile engineering's
spectre program accurately shows peak values
in dBFS
Level meters are one of the most basic features of digital audio software. In software, they are very often implemented as peak meters, which are designed to track the maximum amplitude of the signal. Other kinds of meters, such as VU meters, are often simulations of analog meters. Loudness meters, which attempt to estimate our perception of volume rather than volume itself, are also becoming increasingly common. You may also come across RMS and average meters. In this post, I'm only going to talk about peak meters.

Peak Meters

Peak meters are useful in digital audio because they show the user information that is closely associated with the limits of the medium and because they are efficient and easy to implement. Under normal circumstances, we can expect peak meters to correspond pretty well with our perception of volume, but not perfectly. The general expectation users have when looking at peak meters is that if a signal goes above a certain level at some point, that level should be indicated on the meters. In other words, if the signal goes as high as, say -2 dBFS, over some time period, then someone watching the peak meter during that time will see the meter hit the -2 dBFS mark (see below for more on dBFS). Many peak meters have features such as "peak hold" specifically designed so that the user does not need to stare at the meter.

Beyond that, there are rarely any specifics. Some peak meters show their output linearly, some show their output in dB. Some use virtual LEDs, some a bar graph. In general, if there is a numeric readout or units associated with the meter, the unit should be dBFS.

Now that we know the basics of peak meters, let's figure out how to implement them.

Update Time

Peak meters should feel fast and responsive. However, they don't update instantly. In software, it is not uncommon to have audio samples run at 44100 samples per second while the display refreshes at only 75 times per second, so there is absolutely no point in showing the value of each sample (not to mention the fact that our eyes couldn't keep up). Clearly we need to figure out how to represent a large number of samples with only one value. For peak meters, we do this as follows:

  1. Figure out how often we want to update. For example, every 100 ms (.1s) is a good starting point, and will work well most of the time.
  2. Figure out how many samples we need to aggregate for each update. If we are sampling at 44100 Hz, a common rate, and want to update every .1s, we need N = 44100 * .1 = 4410 samples per update.
  3. Loop on blocks of size N. Find the peak in each block and display that peak. If the graphics system does not allow us to display a given peak, the next iteration should display the max of any undisplayed peaks.

Finding the Peak

Sound is created by air pressure swing both above
below the mean pressure.
Finding the peak of each block of N samples is the core of peak metering. To do so, we can't simply find the maximum value of all samples because sound waves contain not just peaks, but also troughs. If those troughs go further from the mean than the peaks, we will underestimate the peak.

The solution to this problem is simply to take the absolute value of each sample, and then find the max of those absolute values. In code, it would look something like this:



float max = 0;
for( int i=0; i<buf.size(); ++i ) {
   const float v = abs( buf[i] )
   if( v > max )
      max = v;
}

At the end of this loop, max is your peak value for that block, and you can display it on the meter, or, optionally, calculate its value in dBFS first.

Calculating dBFS or Headroom

(For a more complete and less "arm wavy" intro to decibels, try here or here.) The standard unit for measuring audio levels is the decibel or dB. But the dB by itself is something of an incomplete unit, because, loosely speaking, instead of telling you the amplitude of something, dB tells you the amplitude of something relative to something else. Therefore, to say something has an amplitude of 3dB is meaningless. Even saying it has an amplitude of 0dB is meaningless. You always need some point of reference. In digital audio, the standard point of reference is "Full Scale", ie, the maximum value that digital audio can take on without clipping. If you are representing your audio as a float, 0 dB is nominally calibrated to +/- 1.0. We call this scale dBFS. To convert the above max value (which is always positive because it comes from an absolute value) to dBFS use this formula:

dBFS = 20 * log10(max);

You may find it odd that the loudest a signal can normally be is 0 dBFS, but this is how it is. You may find it useful to think of dBFS as "headroom", ie, answering the question "how many dB can I add to the signal before it reaches the maximum?" (Headroom is actually equal to -dBFS, but I've often seen headroom labeled as dBFS when the context makes it clear.)

Thursday, May 30, 2013

The ABCs of PCM (Uncompressed) digital audio

Digital audio can be stored in a wide range of formats. If you are a developer interested in doing anything with audio, whether it's changing the volume, editing chunks out, looping, mixing, or adding reverb, you absolutely must understand the format you are working with. That doesn't mean you need to understand all the details of the file format, which is just a container for the audio which can be read by a library. It does mean you need to understand the data format you are working with. This blog post is designed to give you an introduction to working with audio data formats.

Compressed and Uncompressed Audio

Generally speaking, audio comes in two flavors: compressed and uncompressed. Compressed audio can further be subdivided into different kinds of compression: lossless, which preserves the original content exactly, and lossy which achieves more compression at the expense of degrading the audio. Of these, lossy is by far the most well known and includes MP3, AAC (used in iTunes), and Ogg Vorbis. Much information can be found online about the various kinds of lossy and lossless formats, so I won't go into more detail about compressed audio here, except to say that there are many kinds of compressed audio, each with many parameters.

Uncompressed PCM audio, on the other hand, is defined by two parameters: the sample rate and the bit-depth. Loosely speaking, the sample rate limits the maximum frequency that can be represented by the format, and the bit-depth determines the maximum dynamic range that can be represented by the format. You can think of bit-depth as determining how much noise there is compared to signal.

CD audio is uncompressed and uses a 44,100 Hz sample rate and 16 bit samples. What this means is that audio on a CD is represented by 44,100 separate measurements, or samples, taken per second. Each sample is stored as a 16-bit number. Audio recorded in studios often use a bit depth of 24 bits and sometimes a higher sample rate.

WAV and AIFF files support both compressed and uncompressed formats, but are so rarely used with compressed audio that these formats have become synonymous with uncompressed audio. The most common WAV files use the same parameters as CD audio: 44,100 Hz and bit depth of 16-bits, but other sample rates and bit depths are supported.

Converting From Compressed to Uncompressed Formats

As you probably already know, lots of audio in the world is stored in compressed formats like MP3. However, it's difficult to do any kind of meaningful processing on compressed audio. So, in order to change a compressed file, you must uncompress, process, and re-compress it. Every compression step results in degradation, so compressing it twice results in extra degradation. You can use lossless compression to avoid this, but the extra compression and decompression steps are likely to require a lot of CPU time, and the gains from compression will be relatively minor. For this reason, compressed audio is usually used for delivery and uncompressed audio is usually used in intermediate steps.

However, the reality is that sometimes we process compressed audio. Audiofiles and music producers may scoff, but sometimes that's life. For example, it you are working on mobile applications with limited storage space, telephony and VOIP applications with limited bandwidth, and web applications with many free users, you might find yourself need to store intermediate files in a compressed format. Usually the first step in processing compressed audio, like MP3, is to decompress it. This means converting the compressed format to PCM. Doing this involves a detailed understanding of the specific format. I recommend using a library such as libsoundfileffmpeg or lame for this step.

Uncompressed Audio

Most stored, uncompressed audio is 16-bit. Other bit depths, like 8 and 24 are also common and many other bit-depths exist. Ideally, intermediate audio would be stored in floating point format, as is supported by both WAV and AIFF formats, but the reality is that almost no one does this.

Because 16-bit is so common, let's use that as an example to understand how the data is formatted. 16-bit audio is usually stored as packed 16-bit signed integers. The integers may be big-endian (most common for AIFF) or little-endian (most common for WAV). If there are multiple channels, the channels are usually interleaved. For example, in stereo audio (which has two channels, left and right), you would have one 16-bit integer representing the left channel, followed by one 16-bit integer representing the right channel. These two samples represent the same time and the two together are sometimes called a sample frame or simply a frame.

Sample Frame 1:
Left MSB Left LSB Right MSB Right LSB
Sample Frame 2:
Left MSB Left LSB Right MSB Right LSB
2 sample frames of big-endian, 16-bit interleaved audio. Each box represents one 8-bit byte.

The above example shows 2 sample frames of big-endian, 16-bit interleaved audio. You can tell it's big-endian because the most significant byte (MSB) comes first. It's 16-bit because 2 8-bit bytes make up a single sample. It's interleaved because each left sample is followed by a corresponding right sample in the same frame.

In Java, and most C environments, a 16 bit signed integer is represented with the short datatype. Therefore, to read raw 16 bit data, you will usually want to get the data into an array of shorts. If you are only dealing with C, you can do your IO directly with short arrays, or simply use casting or type punning from a raw char array. In Java, you can use readShort() from DataInputStream.

To store 16-bit stereo interleaved audio in C, you might use a structure like this:

struct {
   short l;
   short r;
} stereo_sample_frame_t ;

or you might simply have an array of shorts:

short samples[];

In the latter case, you would just need to be aware that when you index an even number it's the left channel, and when you index an odd number it's the right channel. Iterating through all your data and finding the max on each channel would look something like this:

int sampleCount = ...//total number of samples = sample frames * channels
int frames = sampleCount / 2 ;
short samples[]; //filled in elsewhere

short maxl = 0;
short maxr = 0;
for( int i=0; i<SIZE; ++i )
   maxl = (short) MAX( maxl, abs( samples[2*i] ) );
   maxr = (short) MAX( maxr, abs( samples[r*i+1] ) );
}
printf( "Max left %d, Max right %d.", maxl, maxr );

Note how we find the absolute value of each sample. Usually when we are interested in the maximum, we are looking for the maximum deviation from zero, and we don't really care if it's positive or negative -- either way is going to sound equally loud.

Processing Raw Data

You may be able to do all the processing you need to do in the native format of the file. For example, once you have an array of shorts representing the data, you could divide each short by two to cut the volume in half:

int sampleCount; //total number of samples = sample frames * channels
short samples[]; //filled in elsewhere

for( int i=0; i
   samples[i] /= 2 ;
}


A few things to watch out for:

  • You must actually use the native format of the file or the proper conversion. You can't simply deal with the data as a stream of bytes. I've seen many questions on stack overflow where people make the mistake of dealing with 16-bit audio data byte-by-byte, even though each sample of 16-bit audio is composed of 2 bytes. This is like adding a multidigit number without the carry.
  • You must watch out for overflow. For example, when increasing the volume, be aware that some samples my end up out of range. You must ensure that all samples remain in the correct range for their datatype. The simplest way to handle this is with clipping (discussed below), which will result in some distortion, but is better than "wrap-around" that will happen otherwise. (the example above does not have to watch out for overflow because we are dividing not multiplying.)
  • Round-off error is virtually inevitable. If you are working in an integer format, eg 16-bit, it is almost impossible to deal with roundoff error. The effects of round-off will be minor but ugly. Eventually these errors will accumulate and be noticeable  The example above will definitely have problems with roundoff error.
As long as studio quality isn't your goal, however, you can mix, adjust volume and do a variety of other basic operations without needing to worry too much.

Converting and Using Floating Point Samples

If you need more powerful or flexible processing, you are probably going to want to convert your samples to floating point. Generally speaking, the nominal range used for audio when audio is represented as floating point numbers is [-1,1].

You don't have to abide by this convention. If you like, you can simply convert your raw data to float by casting:

short s = ... // raw data
float f = (float) s;

But if you have some files that are 16-bit and some that are 24-bit or 8-bit, you will end up with unexpected results:

char d1 = ... //data from 8-bit file
float f1 = (float) d1; // now in range [ -128, 127 ]
short d2 = ... //data from 16-bit file
float f2 = (float) d2; // now in range [ -32,768, 32,767 ]

It's hard to know how to use f1 and f2 together since their ranges are so different. For example, if you want to mix the two, you most likely won't be able to hear the 8-bit file. This is why we usually scale audio into the [-1,1] range.

There is much debate about the right constants to use when scaling your integers, but it's hard to go wrong with this:

int i = //data from n-bit file
float f = (float) i ;
f /= M;

where M is 2^(n-1). Now, f is guaranteed to be in the range [-1,1]. After you've done your processing, you'll usually want to convert back. To do so, use the same constant and check for out of range values:

float f  = // processed data
f *= M;
if( f < - M ) f = -2^(n-1);
if( f > M-1)  f = M-1;
i = (int) f;

Distortion and Noise

It's hard to avoid distortion and noise when processing audio. In fact, unless what you are doing is trivial or represents a special case, noise and/or distortion are inevitable. The key is to minimize it, but doing so is not easy. Broadly speaking, noise happens every time you are forced to round and distortion happens when you change values nonlinearly. We potentially created distortion in the code where we converted from a float to an integer with a range check, because any values outside the range boundary would have been treated differently than values inside the range boundary. The more of the signal is out of range the more distortion this will introduce. We created noise in the code where we lowered the volume because we introduced round-off error when we divided by two. We also introduce noise when we convert from floating point to integer. In fact, many mathematical operations will introduce noise.

Any time you are working with integers, you need to watch out for overflows. For example, the following code will mix two input signals represented as an array of shorts. We handle overflows in the same way we did above, by clipping:

short input1[] = ...//filled in elsewhere
short input2[] = ...//filled in elsewhere
// we are assuming input1 and input2 have size SIZE or greater
short output[ SIZE ];

for( int i=0; i<SIZE; ++i )
   int tmp = (int)input1[i] + (int)input2[i];
   if( tmp > SHRT_MAX ) tmp = SHRT_MAX;
   if( tmp < SHRT_MIN ) tmp = SHRT_MIN; 
   output[i] = tmp ;
}

If it so happens that the signal frequently "clips", then we will hear a lot of distortion. If we want to get rid of distortion altogether, we can eliminate it by dividing by 2. This will reduce the output volume and introduce some round-off noise, but will solve the distortion problem:

for( int i=0; i<SIZE; ++i )
   int tmp = (int)input1[i] + (int)input2[i];
   tmp /= 2;
   output[i] = tmp ;
}

Notes:

A few final notes:
  • For some reason, WAV files don't support signed 8-bit format, so when reading and writing WAV files, be aware that 8-bits means unsigned, but in virtually all other cases it's safe to assume integers are signed.
  • Always remember to swap the bytes if the native endian-ness doesn't match the file endian-ness. You'll have to do this again before writing.
  • When reducing the resolution of data (eg, casting from float to int; multiplying an integer by a non-integer, etc), you are introducing noise because you are throwing out data. It might seem as though this will not make much difference, but it turns out that for sampled data in a time-series (like audio) it has a surprising impact. This impact is small enough that for simple audio applications you probably don't need to worry, but for anything studio-quality you will want to understand something called dither, which is the only correct way to solve the problem.
  • You may have come across one of these unfortunate posts, which claims to have found a better way to mix two audio signals. Here's the thing: there is no secret, magical formula that allows you to mix two audio signals and keep them both at the same original volume, but have the mix still be within the same bounds. The correct formula for mixing two signals is the one I described. If volume is a problem, you can either turn up the master volume control on your computer/phone/amplifier/whatever or use some kind of processing like a limiter, which will also degrade your signal, but not as badly as the formula in that post, which produces a terrible kind of distortion (ring modulation).