Wednesday, December 9, 2009

Linearity and dynamic range in Int->Float->Int

Update: some comments.

In my last blog post, I discussed converting audio from integer to floating point back to integer, mostly from a programming perspective. I showed how there are a lot of ways to do the conversion. Most audio folks would say, "huh, I thought there were only two ways to convert floating point numbers to integers." And they'd be right: with and without dither. So what's all the fuss about?

Indeed, that's a good question. Most audio folks have this expectation:
  1. When I have dither off and no effects (including volume, etc) I expect to be able to get out exactly what I put in.
  2. When I have dither on, I expect it to sound good.
Point 1 is what we referred to as bit transparency in the previous post, and we found lots of ways to do that. Point 2 is a bit more subtle. How do you make something sound good? In this case, we mean transparent, and what's especially critical is that we eliminate truncation and IM distortion which are the hallmarks of cold, harsh digital audio.
Figure 1. Comparison of 16-bit conversion using the same scaling factor (matched) vs. different scaling factors (mismatched). Mismatched scaling factors come from Method 3 from previous post and matched are Method 2.

What we need when it comes to transparency and avoiding that cold harsh sound is linearity. In this regard, the methods discussed in my last post, transparent or not, don't stack up equally. You might think you could judge them by inspection, but the mathematics are a bit more complex. Let's be clear about what we need to test: what we don't care about is how accurately a given conversion method responds to a DC signal: we aren't measuring the temperature or the amount of fuel in a tank. Rather, when we talk about linearity in audio we are referring to the ability to accurately translate dynamic information. Think about it: when you buy an analog-to-digital converter, you aren't concerned about its ability to accurately measure a certain input voltage, are you? No, you care about it's frequency response and dynamic range. In the same way, we must ensure maximum signal-to-noise ratio and dynamic range in our conversions. It turns out not all the conversions from my last post have good dynamic performance.


It is sometimes claimed that the percent error introduced by "mismatched" conversion (ie Method 3 from the previous post) is small, and therefore of little concern, but percent error is not what matters in a dynamic system such as audio, so we will not concern ourselves with that and investigate the dynamic performance instead. In Figure 1 we show the results of "mismatched" conversion. In this case we are converting from a source signal of 2 sine waves in double precision to 16-bit integer (to simulate A/D conversion), then to single-precision floating point and back to 16-integer (to simulate a standard editing workflow), and finally back to double precision (to simulate D/A conversion). This is more or less the minimum error we can expect with the mismatched method if we use audio editing software but do not use DSP, and therefore represents a best-case scenario. In the dynamic analysis, it becomes clear that using different scaling factors produces more noise whether dither is used or not. In fact, the difference made by dither is dwarfed by the difference in techniques. Just as importantly, the quality of the noise is bad: rather than shifting the noise floor up, we see spikes indicating that the noise is likely to be audible even at low levels. These results also suggests that it is important to use the same scaling factors throughout the processing chain.

Figure 2. Quantization and dithering from float to int and back to float is tested at 16 bits (a,b) and 24 bits (c,d) using a full-scale sine (a,c) and the sum of two sine (b,c). Notes: the sum of two sines does not clip; clipped signal and raw quantized signal are not shown in a.
Figure 2. shows the dynamic performance of conversion using 2^n, (2^n)-1 and "asymmetrical" conversion (ie Method 4 from my previous post). We will discuss below that "asymmetrical" is  a misnomer. We also looked at dithered and non-dithered versions.

Two types of tests were run: first, a full-scale sine wave was generated, converted to int, and back to float for FFT analysis. The second test was the same except that two sines, each at 1/2 full scale were summed together. Each test was run at 16 and 24 bits. Note that the full-scale sine wave cannot be accurately represented in some of these conversion methods, resulting in some clipping.

As you can see, all dithered converters performed fine at 16-bit as long as nothing was out of scale. At 24-bit, the weakness of the (2^n)-1 converter becomes clear: it actually performs worse than rounding (ie. no dithering). Clearly (2^n)-1 is not an acceptable transformation for 24-bit integers and single precision floating point numbers. The 2^n converter performed admirably on all tests except the 16-bit full-scale test (1a). Those small spikes line up perfectly with the spikes caused by clipping as expected (results not shown) meaning that it is harmonic distortion -- not the worst thing that could happen, but, still, the asymmetric converter does outperform it in this regard.

As mentioned, I'm calling Method 4 from my previous post the "asymmetric" method, but it is only asymmetric in the sense that you apply different math to positive and negative numbers. As these results show, it is linear. Moreover, it is symmetric with respect to dither amplitude, which is what ensures its linear behavior.


Clearly the two winners here are the so-called asymmetric method and the (2^n) method. Both methods excel in the critical areas of bit transparency and linearity. Even their un-dithered performance is quite good, and they are obviously superior to other methods.

The one area in which the asymmetric model outperforms the (2^n) model is in terms of clipping signals that originated from higher resolution. Even with dither, we still see incorrect behavior with the the (2^n) model because dither only finds its way to 1/2 LSB, whereas +1 clips by going 1 LSB over. The question is whether or not this matters. Indeed there is some debate about the importance of +1. My opinion? +1 is a value that occurs in the real world and it's not always possible for the code that's producing the +1 to know what the output resolution is going to be. For example, a VST synth plugin has no way of knowing what the output resolution is going to be, so it can't be expected to know what to scale its output to. When converting from 24 bit to 16 bit and using float as an intermediary, there is no simple way to solve this problem.

On the other hand, non-pro A/D converters frequently clip around -.5 dBFS, which is below +1 - 1 LSB anyway. Conceivably, you could also correct for this by introducing a level shift at the output equal to 1/2 LSB, but that's equivalent to turning your converter into a (2^n)-.5 converter -- it solves one problem, but introduces another. All that said, there is no reason not to develop software, especially libraries, drivers and other software intended for use by multiple type of users including audiophiles and pro audio engineers, that is convenient to use while meeting the highest audio standards: just use the asymmetric converters.

Given the potential hazards found in mixing and matching conversion methods, I recommend that all libraries (and drivers, if possible) offer options for various conversion settings, both to minimize bit transparency problems and unnecessary quantization noise, until all libraries and drivers can standardize on the asymmetric conversion method. This is the only way to guarantee transparency and maximize linearity. As these results show, this issue may be more important than dither.

Wednesday, December 2, 2009

Int->Float->Int: It's a jungle out there!

It turns out that the simple operation of converting from float to integer and back is not so simple. When it comes to audio, this operation should be done with care, and most programmers do, in fact, put a lot of thought into it. The problem most programmers observe is that audio, when stored (or processed) as an integer, is usually stored in what's called "two's complement" notation, which always gives us 1 more negative number than positive. When we process or store floating point numbers, we use a nominal range of -1 to +1.

The fact that there are more negative numbers than positive numbers has caused some confusion amongst programers, and a number of different conversion methods have been proposed. Here is my survey of how a number of existing software and hardware packages handle this conversion. In these examples, I show conversions for 16-bit integers, but they all extend in the obvious way to other bit depths. It is important to consider how these methods extend to larger integers, especially how they extend to 24-bit integers, so I've tested bit transparency for these methods up to 24-bit using single precision floating point intermediaries, correcting for the fact that IEEE allows for extended precisions to be used in computations. Endianness is irrelevant here, because everything works for big and little endian systems.

Transparency is only required or possible when the data has not been created synthetically or altered via DSP (including such simple operations as volume changes, mixing, etc). In cases where transparency is not possible, dither must be applied when converting to integer or reducing the resolution. In many software packages it is up to the end-user to make this determination and manually switch dither on or off. In my next post I will discuss dithering and linearity.

Int to Float
Float to Int*
Used By
((integer + .5)/(0x7FFF+.5)
Up to at least 24-bit
DC DAC Modeled
(integer / 0x8000)
float * 0x8000
Up to at least 24-bit
Apple (Core Audio)1, ALSA2, MatLab2, sndlib2
(integer / 0x7FFF)
float * 0x7FFF
Up to at least 24-bit
Pulse Audio2
(integer / 0x8000)
float * 0x7FFF
PortAudio1,2, Jack2, libsndfile1,3
Up to at least 24-bit
At least one high end DSP and A/D/A manufacturer.2,4 XO Wave 1.0.3.
*obviously, rounding or dithering may be required here.
Note that in the case of IO APIs, drivers are often responsible for conversions. The conversions listed here are provided by the API.

Method 0 is one possible method for preserving the DC accuracy of a DAC, and is included here for reference.

Edited December 6, 2009: Fixed Method 3. (0x8000 and 0x7FFF were backwards)

1 Mailing list
2 Perusing the source code (this, of course, is subject to mistakes due to following old, conditional or optional code)
3 libsndfile FAQ goes into detail about this.
4 Personal communication.