bjorg

Thursday, July 15, 2010

I signed a deal with the devil and now he wants my soul! Tales of Record Industry Woe...

Yesterday I visited Liz Phair's web site hoping to hear this new song where she raps. Instead I was subjected to 90's quality web-design and a short note about how rebellious her new songs are because they cost her her management and record label. "You weren't supposed to hear them," it read. It reminded me of one of those conspiracy theory web sites.

Today I read an article about business practices at major label (Mis-titled "RIAA Accounting" as if RIAA does the accounting at major labels).

Every few weeks or so, I hear a story about how evil record labels are for not paying musicians or not understanding them, or something along those lines. Search the internet and you will find lots of stories about how record labels do the accounting in order to avoid paying bands what they rightfully "deserve" or would have made under different conditions, like with a smaller label, or if they had done it themselves, or if the label had done the accounting "fairly" or something like that.

Well, most of it's true, more or less. Record labels are out to make money. Plain and simple. The record business is, er, a business (newsflash!). What's most surprising to me about this is how surprising it seems to be to so many people.


Let me see if I understand correctly. You say a deal with the devil didn't work out the way you expected?

Musicians have many goals, one of which, like the labels, may be to make money. Many of the goals of musicians are at odds with the goals of the labels. When it comes to signing the contract, though, the labels have the upper hand: musicians are desperate to get signed, and the labels have been doing this for years and they know how to make bands sign a contract that works better for them than for the band.

But let's get something straight: Investing in bands is hard. Seriously, you may think you can pick the next hit band when you hear them at the bar, but there's a lot more to it than writing a catchy song or two. Can the band deal with management? Can they work in the studio? How do test audiences respond to them (this is more important than how real audiences respond to them)? Have they sought legal council? There's a million questions many of which have nothing to do with the quality of their music and it's still a crap-shoot.

And yet, record labels are doing it: they are investing in bands. In order for it to work at all, some really ugly stuff needs to happen. Crappy pop bands need to get signed. Lousy deals need to be made, and, here comes the horrible truth you don't want to hear, most bands fail by any measure. In my opinion, major labels could probably make some more money off smaller artists if they invested more, but instead they focus on the artists they think are going to be huge, because they are big organizations structured for big payoffs. The mid-sized or potentially mid-sized artist is best served by signing to a smaller label or hiring a private publicist, and either paying out of pocket, or getting money for that in record contract. Of course, no one thinks about being a mid-sized artist when they sign a record contract.

It's not pretty, but that's music industry sausage. But, contrary to the complaining, the label doesn't want the band it's invested in to starve, either: a savvy band with good management has plenty of opportunities besides record sales to make money, and the record label has no interest in cutting into that. Usually, these sorts of proceeds are not included in these "record labels are evil" calculations, but that's probably fair, because the point of those calculations is usually to show that labels aren't paying musicians, which, by and large, they aren't. A savvy band should realize that album sales are promotion for they other items, and the label knows that other items such as touring (which, as I understand it, is increasingly becoming part of the label income, too) are promotion for the records.


A smart band will negotiate their record contract rather than just signing it. As I mentioned earlier, it's still pretty one-side, but small things can be huge wins for the band. To use an old example I happen to remember, Primus kept the rights to their demos and made a lot of money off the sales of them when their album sold well. The fact is, contracts wouldn't suck so bad for bands if bands didn't want to be signed so badly. But bands are desperate. They want contracts, so they sign them. Bands can and sometimes do walk away from the table, but that's rare, and the record contract is the only deal on the planet where someone is going to give you significant amounts of money to record and play your music. Plus, it's cool to bitch about record labels being "evil" after you are signed. Like you had nothing to do with it. "I made a deal with the devil and now he wants my soul! Can you believe how evil he is?"

Maybe that's unfair of me, but it's worth emphasizing the flip-side of the argument, when the internet is full of "record labels are evil and screw artists over" talk. I just read one article where the author compares a record deal to a loan. It's not a loan. Loans come with a promissory note, which is a promise to pay the money back. Banks only give you a loan when they think they are going to get the money back, and if they can't get you to pay, they take your collateral, which is usually something like your house. The record label knows that there's a good chance that they are not going to get the money back. Would a bank give a band a loan to make a CD? Maybe before the housing market collapse, but even then you'd need some collateral. Like the afore mentioned house. But the label isn't taking collateral. While there are many crappy things about all this, we have to appreciate that one fact. It's kinda magical.

So if it's not a loan, what is it? It's an investment contract, like venture capital. And as with venture capital, when the original investment goes big, the venture capitalist, ie the record label, gets a big cut. They also want a fair bit of control, ownership, and so on. And yea, that sucks. Talk to anybody who's started a small company with venture capital about how their investors "don't get it". The best position to be in is to have a good contract from the get-go, and to do that you need as much negotiating power as you can get. That might mean turning down the first record contract that comes along, or spending some cash on a real shark to read and help you understand the contract, or maybe you need shop your demos to the labels who are a better match for your needs, rather than just every label in town. At the end of the day it means research, hard work and compromises.

But if that still seems unfair -- and I'm not arguing it's totally fair -- I ask again where else are you going to get money to make music and tour with your band? The fact is that labels are not signing the contract to be nice to you because they think your music is awesome and they believe in awesome music. They are signing you because they want to make money. As much money as possible. I wonder why bands forget they are making a business deal when sign these contracts and that the other party wants something, too. It's not like it's free money: it's an investment in them as a business, and if they can't think of it that way, they shouldn't sign the contract.

Wednesday, July 14, 2010

Drupal multisite backups

I run a few sites from a single drupal install. The drupal website has a nice bash script for backing up a single site, and a multisite backup script written in perl. Instead of using the perl script, I decided to modify the single-site backup script. Their perl script looks like it's a bit more flexible, but this more than does the job for most setups. Maybe others will find this useful. I just wrote it and did some cursory testing, but it seems like it checks out, and it's pretty simple:

#!/bin/bash
#
# Bjorn's multisite backup V1.0
#
# Full backup of website of a drupal multisite install.
#
# Most multisite drupal installs have one directory tree and multiple
# sql databases. Using this script, you can back them all up.
# This is useful for a handful of site. If you have hundreds of sites
# or something crazy like that, you will need another solution.
#
# This script is loosly based on the fullsite backup script
# drom drupal.org, and suffers from some of the same disadvantages,
# such as no database/site locking, but it basically works.
#
# To use, modify the variables as required in the configuration section
# and stick this in a cron job. run it in the directory where you want
# your backups created. Please note if this file has been modified from
# the original.

# by Bjorn Roche

# copyright:
# There was no copyright notice on the original fullsite backup script,
# so I don't know what to say about this. However, I make no claims to
# this version.

# However, I do appreciate if you could leave my name on it, so that
# people know where to send corrections and enhancements to.

# Warranty:
# This script comes with absolutely no warranty, express or implied,
# including fitness for any purpose.
# It is solely your responsibility to read the code, understand it,
# and make sure the executable portion is correct and applicable to
# your application. It is provided as is, and could not and would
# not be provided at all with a waranty of any kind.
# Use it at your own risk.


# ------ start of configuration ---- #

# backup prefix
  prefix=multisite-backup

# Website Files
  webrootdir=/home/xodrupal/drupal-sites  # (e.g.: webrootdir=/home/user/public_html)

# site list
# List each site you want to backup in parentheses.
  sites=(site1.com site2.com site3.org)

# database info lists
# for each item, specify the required info in a list.
# the list items must be in the same order as the list of sites
# so that everything corresponds.
  dbname=(db1 db2 db3)
  dbhost=(db.site1.com db.site2.com db.site3.org)
  dbuser=(user1 user2 user3)
  dbpass=(pass1 pass2 pass3)


# ------ end of configuration ------ #

# setup some constants:
  datestamp=`date +'%Y-%m-%d'`
  numsites=${#sites[*]}
  startdir=`pwd`
  tempdir=tmpbckdir-$datestamp
  tarname=$prefix-$datestamp.tgz

#
# Banner
#
echo ""
echo "bjorn's multisite backup V1.0"

#
# Create temporary working directory
#
echo " .. Setting up temporary working dir"
mkdir $tempdir
mkdir $tempdir/$prefix-$datestamp
echo "    done"

#
# TAR website files
#
echo " .. TARing website files in $webrootdir"
cd $webrootdir
tar cf $startdir/$tempdir/$prefix-$datestamp/filecontent.tar .
echo "    done"

#
# dump each database
#
i=0
echo " .. sqldump'ing $numsites databases:"
while [ $i -lt $numsites ]; do
 echo "dumping    user: ${dbuser[$i]}; database: ${dbname[$i]}; host: ${dbhost[$i]}..."
 cd $startdir/$tempdir/$prefix-$datestamp
 mysqldump -p${dbpass[$i]} --user=${dbuser[$i]} --host=${dbhost[$i]} --add-drop-table ${dbname[$i]} > ${sites[$i]}-dbcontent.sql
 echo "    done"

 let i++
done

#
# Create final backup file
#
echo " .. Creating final compressed (tgz) TAR file: $tarname"
cd $startdir/$tempdir
tar czf $tarname $prefix-$datestamp
mv $tarname $startdir
echo "    done"


#
# Cleanup
#
echo " .. Clean-up"
cd $startdir
rm -r $tempdir
echo "    done"


#
# Exit banner
#
echo " .. multi-site backup complete"
echo ""

Friday, April 9, 2010

Comments on Conversions

It didn't surprise me how impassioned a response I got to my post about converting audio data from integer to floating point. I posted a link to it on one mailing list and got some heated responses. Despite the extremely geeky nature of it, the fact is that I've seen this discussion on mailing lists before and it always seems to turn into a flame war. People put a lot of thought into implementing the simple conversion of audio from float to int and back and no matter what choice they make, they are invariably criticized for it, so it's only natural to be on the defensive.

While I contest that my post represents more thought and analysis (and better thought and analysis) than is available anywhere else publicly (certainly than I know of), I did not intend for it to be the be-all or end-all to the discussion, even if I implied otherwise. Some of the criticisms I received bordered on the absurd (it's true that my blog entry is not peer reviewed), while other criticisms were face-valid, but irrelevant (whether one solution is more pleasing mathematically is irrelevant if it is going to produce worse sounding results). However, digging though the criticisms it's apparent that some things from my analysis can be improved.

To that end, I'm going to use this entry to accumulate comments and thoughts that need to be made on the subject as they come up and/or need to be made. So this is a living blog post that will be updated and revised from time to time.

April 9th 2010

- I claimed that looking at the no DSP case was a "best case" situation, and that any DSP would only make whatever distortion occurred worse. Therefore, I argued, this was the only case that needed to be considered. Not everyone agrees with this, but it's also hard to generalize DSP. I might be worth analyzing some simple DSP like volume ramping.

- I contrasted the distortion produced by using the wrong conversion method to the distortion created by not using dither. However, error produced from truncation is most objectionable with low-level signals while only high-level signals were tested, so this is not a fair comparison.

- It would be worthwhile to test conversions from 24-bit to 16-bit of several different audio source types to determine if the harmonic distortion of the (2^n) model is relevant in that case.

Friday, March 26, 2010

Java (No)FX - why one project dropped JavaFX for Java

There's a lot of FUD out there. For some reason, it seems like Java and JavaFX take a hard hit. I've heard nonsense like "no one uses Java anymore," and other such stuff. Unfortunately, some of the JavaFX FUD might be true: I recently completed a project which was little more than replacing a buggy, slow JavaFX UI for a Java applet with a fast, pure Java UI of the same applet. Thank God we did, because the applet is way better now.

The JavaFX portion was, at one point, the largest JavaFX codebase in existance, and, indeed, the first serious large JavaFX project not funded by Sun. And the company I was working with dropped JavaFX for plain old Java.

Although I can't say what the project was, I can say that this is a project that most companies would have tried to use Flash for, but, in the end, Java provided features that competitors using Flash simply can not offer. As far as I know, this company is now the only company with these features in a browser because they went with a Java foundation.


Now keep in mind that a large part of the problem, here, is that this was really the first major JavaFX project, so it was bound to have difficulties. We had support from Sun, but I think Sun did not realize how far they had to go and how many bugs they still had in their runtime. In my opinion, nobody, neither us nor Sun, was aware of the problems of JavaFX. Despite it's version number (1.1 when we started and 1.2 by the time we abandoned), we found it to be buggy. Had the version number been 0.7, I would right now be saying JavaFX is the coolest thing ever, but the truth is that it's still a nascent technology that has yet to prove itself.

Lack of Competent Developers

Symptomatic of a new technology, we had a hard time finding quality developers. Sun gave us some leads but these developers were simply not up to the task of building our complex app. JavaFX is not a difficult programming language, so having someone in-house learn it would have been preferable, but at the time we did not have the resources.

Performance

The performance of the JavaFX portion of our app was extremely poor. The extremely performance critical portions of the app were written in Java and performed well, but almost all the graphics were written in JavaFX. Basic graphics like buttons were okay, but complex drawing was very slow. I don't know for sure if this was a result of poor design or JavaFX itself, but from the code I perused it looked like a little of both. We know JavaFX was at least partly to blame because we had performance issues even when we commented out the complex calculations.

Unstable API

Transitioning from JavaFX 1.1 to JavaFX 1.2 turned out to be very difficult because of backwards compatibility issues. Unfortunately, at a certain point in our project, our app was performing so badly with even rudimentary graphics tasks that Sun and our JavaFX developers insisted on upgrading.

Lack of Quality Control

At one point, after we released a baseline product with minimal features, Sun released a new upgrade to JavaFX. Unfortunately, this meant that JavaFX libraries would be updated on all machines that ran our app. Since there was a bug in the library, our app stopped working.

Poor Developer Communication

In an effort to work around the bug, we asked Sun for the developer version of the new library. Unfortunately, they were not forthcoming. At that point, the decision was made to stop development on the JavaFX solution and seek an alternative. After I convinced the team that Swing was capable of looking great, we developed a pure-Java alternative and have released the new product. The graphics performance is orders of magnitude better, and the look is similar to Flash. People who have seen it so far have gone out of their way to complement the appearance of the app. The appearance matches the design spec almost perfectly except for a few things we have not yet had a chance to attend to. (I am belaboring this point because Swing has a reputation for being ugly, simply because most of the included Look and Feels are ugly)

Future Seems to Be Too Business Oriented

When asking Sun about the future of JavaFX, they said they plan to offer certification levels. It's bad enough that Java skills are based on memorization rather than problem solving abilities now, but I can sort of understand that given the business orientation of Java, and maybe businesses look for that sort of thing.

If hip kids don't learn Java, that's okay. There's more than enough Java programmers to keep Amazon, ebay, oracle and all those other Java giants going. But JavaFX is not like that. JavaFX is a creative tool. It's not a direct competitor to flash, but it's in the same vein, and those people aren't going to take a certification exam. Sun needs to think about what is going to make people think JavaFX is awesome, and certification is not it. Neither is having Gosling slinging T-Shirts at them. JavaFX really really has the potential to be awesome, and I really really want it to be, so here's what I think you need to do (Sun, I'm talking to you):

  • Show people that you are serious about making it awesome. Hire some awesome programmers and have them blog and tweet or whatever the cool kids are doing these days about what they are doing. Really. Hire the best. Hire some young folks. Hire some experienced old guns. Mix and match. Make sure you let them be honest on those blogs.
  • Talk to some cool startups that do mobile stuff like venmo about what they are doing in the mobile sphere and work with them proactively.
  • Suck it up: work with the android people. Sure they stabbed you in the back, but you need them.
  • Help out Apple: really, their JVM sucks. It seems like I am filing bug reports every week. Make sure they get it right because cool geeks use Apple.
  • While you're at it, make sure your JavaFX staff has Macs and other cool toys. Remember, you need to treat those folks as creatives, not code monkeys.

Wednesday, December 9, 2009

Linearity and dynamic range in Int->Float->Int

Update: some comments.

In my last blog post, I discussed converting audio from integer to floating point back to integer, mostly from a programming perspective. I showed how there are a lot of ways to do the conversion. Most audio folks would say, "huh, I thought there were only two ways to convert floating point numbers to integers." And they'd be right: with and without dither. So what's all the fuss about?

Indeed, that's a good question. Most audio folks have this expectation:
  1. When I have dither off and no effects (including volume, etc) I expect to be able to get out exactly what I put in.
  2. When I have dither on, I expect it to sound good.
Point 1 is what we referred to as bit transparency in the previous post, and we found lots of ways to do that. Point 2 is a bit more subtle. How do you make something sound good? In this case, we mean transparent, and what's especially critical is that we eliminate truncation and IM distortion which are the hallmarks of cold, harsh digital audio.
Figure 1. Comparison of 16-bit conversion using the same scaling factor (matched) vs. different scaling factors (mismatched). Mismatched scaling factors come from Method 3 from previous post and matched are Method 2.

What we need when it comes to transparency and avoiding that cold harsh sound is linearity. In this regard, the methods discussed in my last post, transparent or not, don't stack up equally. You might think you could judge them by inspection, but the mathematics are a bit more complex. Let's be clear about what we need to test: what we don't care about is how accurately a given conversion method responds to a DC signal: we aren't measuring the temperature or the amount of fuel in a tank. Rather, when we talk about linearity in audio we are referring to the ability to accurately translate dynamic information. Think about it: when you buy an analog-to-digital converter, you aren't concerned about its ability to accurately measure a certain input voltage, are you? No, you care about it's frequency response and dynamic range. In the same way, we must ensure maximum signal-to-noise ratio and dynamic range in our conversions. It turns out not all the conversions from my last post have good dynamic performance.

Tests

It is sometimes claimed that the percent error introduced by "mismatched" conversion (ie Method 3 from the previous post) is small, and therefore of little concern, but percent error is not what matters in a dynamic system such as audio, so we will not concern ourselves with that and investigate the dynamic performance instead. In Figure 1 we show the results of "mismatched" conversion. In this case we are converting from a source signal of 2 sine waves in double precision to 16-bit integer (to simulate A/D conversion), then to single-precision floating point and back to 16-integer (to simulate a standard editing workflow), and finally back to double precision (to simulate D/A conversion). This is more or less the minimum error we can expect with the mismatched method if we use audio editing software but do not use DSP, and therefore represents a best-case scenario. In the dynamic analysis, it becomes clear that using different scaling factors produces more noise whether dither is used or not. In fact, the difference made by dither is dwarfed by the difference in techniques. Just as importantly, the quality of the noise is bad: rather than shifting the noise floor up, we see spikes indicating that the noise is likely to be audible even at low levels. These results also suggests that it is important to use the same scaling factors throughout the processing chain.

Figure 2. Quantization and dithering from float to int and back to float is tested at 16 bits (a,b) and 24 bits (c,d) using a full-scale sine (a,c) and the sum of two sine (b,c). Notes: the sum of two sines does not clip; clipped signal and raw quantized signal are not shown in a.
Figure 2. shows the dynamic performance of conversion using 2^n, (2^n)-1 and "asymmetrical" conversion (ie Method 4 from my previous post). We will discuss below that "asymmetrical" is  a misnomer. We also looked at dithered and non-dithered versions.

Two types of tests were run: first, a full-scale sine wave was generated, converted to int, and back to float for FFT analysis. The second test was the same except that two sines, each at 1/2 full scale were summed together. Each test was run at 16 and 24 bits. Note that the full-scale sine wave cannot be accurately represented in some of these conversion methods, resulting in some clipping.

As you can see, all dithered converters performed fine at 16-bit as long as nothing was out of scale. At 24-bit, the weakness of the (2^n)-1 converter becomes clear: it actually performs worse than rounding (ie. no dithering). Clearly (2^n)-1 is not an acceptable transformation for 24-bit integers and single precision floating point numbers. The 2^n converter performed admirably on all tests except the 16-bit full-scale test (1a). Those small spikes line up perfectly with the spikes caused by clipping as expected (results not shown) meaning that it is harmonic distortion -- not the worst thing that could happen, but, still, the asymmetric converter does outperform it in this regard.

As mentioned, I'm calling Method 4 from my previous post the "asymmetric" method, but it is only asymmetric in the sense that you apply different math to positive and negative numbers. As these results show, it is linear. Moreover, it is symmetric with respect to dither amplitude, which is what ensures its linear behavior.

Conclusions

Clearly the two winners here are the so-called asymmetric method and the (2^n) method. Both methods excel in the critical areas of bit transparency and linearity. Even their un-dithered performance is quite good, and they are obviously superior to other methods.

The one area in which the asymmetric model outperforms the (2^n) model is in terms of clipping signals that originated from higher resolution. Even with dither, we still see incorrect behavior with the the (2^n) model because dither only finds its way to 1/2 LSB, whereas +1 clips by going 1 LSB over. The question is whether or not this matters. Indeed there is some debate about the importance of +1. My opinion? +1 is a value that occurs in the real world and it's not always possible for the code that's producing the +1 to know what the output resolution is going to be. For example, a VST synth plugin has no way of knowing what the output resolution is going to be, so it can't be expected to know what to scale its output to. When converting from 24 bit to 16 bit and using float as an intermediary, there is no simple way to solve this problem.

On the other hand, non-pro A/D converters frequently clip around -.5 dBFS, which is below +1 - 1 LSB anyway. Conceivably, you could also correct for this by introducing a level shift at the output equal to 1/2 LSB, but that's equivalent to turning your converter into a (2^n)-.5 converter -- it solves one problem, but introduces another. All that said, there is no reason not to develop software, especially libraries, drivers and other software intended for use by multiple type of users including audiophiles and pro audio engineers, that is convenient to use while meeting the highest audio standards: just use the asymmetric converters.

Given the potential hazards found in mixing and matching conversion methods, I recommend that all libraries (and drivers, if possible) offer options for various conversion settings, both to minimize bit transparency problems and unnecessary quantization noise, until all libraries and drivers can standardize on the asymmetric conversion method. This is the only way to guarantee transparency and maximize linearity. As these results show, this issue may be more important than dither.

Wednesday, December 2, 2009

Int->Float->Int: It's a jungle out there!

It turns out that the simple operation of converting from float to integer and back is not so simple. When it comes to audio, this operation should be done with care, and most programmers do, in fact, put a lot of thought into it. The problem most programmers observe is that audio, when stored (or processed) as an integer, is usually stored in what's called "two's complement" notation, which always gives us 1 more negative number than positive. When we process or store floating point numbers, we use a nominal range of -1 to +1.

The fact that there are more negative numbers than positive numbers has caused some confusion amongst programers, and a number of different conversion methods have been proposed. Here is my survey of how a number of existing software and hardware packages handle this conversion. In these examples, I show conversions for 16-bit integers, but they all extend in the obvious way to other bit depths. It is important to consider how these methods extend to larger integers, especially how they extend to 24-bit integers, so I've tested bit transparency for these methods up to 24-bit using single precision floating point intermediaries, correcting for the fact that IEEE allows for extended precisions to be used in computations. Endianness is irrelevant here, because everything works for big and little endian systems.

Transparency is only required or possible when the data has not been created synthetically or altered via DSP (including such simple operations as volume changes, mixing, etc). In cases where transparency is not possible, dither must be applied when converting to integer or reducing the resolution. In many software packages it is up to the end-user to make this determination and manually switch dither on or off. In my next post I will discuss dithering and linearity.


Int to Float
Float to Int*
Transparency
Used By
0)
((integer + .5)/(0x7FFF+.5)
float*(0x7FFF+.5)-.5
Up to at least 24-bit
DC DAC Modeled
1)
(integer / 0x8000)
float * 0x8000
Up to at least 24-bit
Apple (Core Audio)1, ALSA2, MatLab2, sndlib2
2)
(integer / 0x7FFF)
float * 0x7FFF
Up to at least 24-bit
Pulse Audio2
3)
(integer / 0x8000)
float * 0x7FFF
Non-transparent
PortAudio1,2, Jack2, libsndfile1,3
4)
(integer>0?integer/0x7FFF:integer/0x8000)
float>0?float*0x7FFF:float*0x8000
Up to at least 24-bit
At least one high end DSP and A/D/A manufacturer.2,4 XO Wave 1.0.3.
5)
Uknown
float*(0x7FFF+.49999)
Unknown
ASIO2
*obviously, rounding or dithering may be required here.
Note that in the case of IO APIs, drivers are often responsible for conversions. The conversions listed here are provided by the API.

Method 0 is one possible method for preserving the DC accuracy of a DAC, and is included here for reference.

Edited December 6, 2009: Fixed Method 3. (0x8000 and 0x7FFF were backwards)

Sources:
1 Mailing list
2 Perusing the source code (this, of course, is subject to mistakes due to following old, conditional or optional code)
3 libsndfile FAQ goes into detail about this.
4 Personal communication.

Wednesday, November 11, 2009

WAVE64 vs RF64 vs CAF

Right now I am choosing new a default internal audio file format for XO Wave, and I'd like to choose a format that offers large file sizes and high resolution. I'd like to use an existing popular standard rather than inventing my own or using RAW audio. The pro audio industry is finally moving towards 64-bit file formats, and the three options supported by most pro software are

  • Wave64, aka Sony Wave64, originally developed by Sonic Foundry before 2003, is an open standard and a true 64-bit format: all 32-bit fields are replaced with 64-bit fields, and all chunks are 8-byte word aligned. Instead of the dreaded FourCC it uses GUID. Other than that, it is pretty much the same as WAV, so the spec is barely 4 pages long, although in my opinion it could stand to be a bit longer, as many aspects of WAV are so poorly devised it really wouldn't hurt for someone to put it all in one place. Some people have criticized the use of GUID on the grounds that there will never be that many chunks, but this misses the point: the point of using GUIDs is that anyone can define their own chunk without having to check with Sony or register a chunk ID. It's actually rather clever.
  • RF64 was proposed in 2005 by the EBU with full knowledge of Wave64. Although the proposal stated basic requirements that could have easily been met by a few minor extensions to Wave64, and they stated a desire to "join forces" with the developers of Wave64, they made no effort to do so other than to say they hoped they'd be involved. Moreover, the same document proposes RF64 as an alternative, incompatible 64-bit extension to the WAV format. Unlike Wave64, RF64 is not a true 64-bit format. All existing "chunks" remain 32-bit, so, for example, markers, regions and loops will no longer work past a certain number of samples. Even EBU's levl chunk will not work with RF64 because it uses a 32-bit address for pointing to the "peak-of-peaks" in the raw data. RF64 offers the much made-of promise of backwards compatibility via a "junk chunk", but, of course, this is possible with Wave64 as well, as pointed out in the Wave64 spec.
  • CAF, or Core Audio Format was Apple's entry into the ring. Apple didn't want to be left out of the 64-bit game, after all, and around the same time in 2005 they released CAF. Since they are Apple, they figured people would adopt it (Logic would, if no one else), even if there were competing specs. Their approach, however, was to start from scratch, and it's pretty refreshing. Indeed, the spec addresses practical issues to ensure that important features are implemented, and it even makes that tiny little bit of extra effort required to avoid file corruption by not requiring a header rewrite to finalize a recording of unknown length (Anyone who's ever recorded using software knows that once in a while something goes wrong and a file ends up corrupted. It's so nice that someone finally addressed this in a spec.).
The WAVE format is problematic in many, many ways. For example, in some places it uses zero-based indexing, in others it uses one-based indexing. Sometimes it uses signed integers for raw audio data, other times unsigned. That may not seem so bad, but considering how simple the data it's trying to carry is, but when you add to that the fact that Microsoft had to use format extensions just to clear up ambiguous documentation (and they've still got an ambiguously documented "fact" chunk), it's really not good territory. It is a shame that both Sonic Foundry/Sony and the EBU chose WAVE as the format to extend. Moreover, it's annoying that EBU designed their own, incompatible 64-bit extension to WAVE when a superior one already existed.

Some people think the whole "backwards compatibility" thing is a bunch of hooey because it puts an undo burden on the people writing the libraries. Erik de Castro Lopo, author of the popular LGPL'ed libsoundfile says:

Quite honestly, its stuff like this that makes me think the people who write these specs smoke crack!
If I were to follow the ... insane advice [about retaining backwards compatibility], the test suite would have to write > 4Gig files in order to write a real RF64 file instead of just a normal WAV file.
In order to avoid this insanity, libsndfile, when told to write an RF64 file does exactly as its told.
I would add that the backwards compatibility adds another point of failure in the recording process, in the same way that header rewrites are a point of failure in most current formats (except for CAF and "chunkless" formats like RAW and AU).

All that aside, RF64 is gaining some popularity and support -- probably more than Wave64. As for CAF, it's less popular, but since it's an Apple standard it's probably not going anywhere even if it's not going to be the "next big thing." It could be a fine place to work from, but just scanning the docs everything I looked at brought up a few issues that worried me. For example:



  • The CAFMarker data-type has three design flaws I noticed. One is that the frame position is a floating point number. I might be missing something here, but in a format where everything else that counts frames and bytes as 64-bit integers, why are we suddenly using floats? Sure that will be integral to pretty big numbers since it's 64-bit, but it's still a float. I didn't use a format like this to get pretty accurate big numbers when I could get completely accurate big numbers! Internally, most apps are going to be converting 64-bit integers to 64-bit floats, which is insane. Another problem is mChannel, which is the channel (starting at 1) that the marker refers to or zero if the marker refers to all channels. Okay, seems reasonable, except that the spec also defined a channel mapping with a 32-bit channel layout bitmask. Why not use that? Granted you might have more than 32-channels, but that's not going to be the most common case, and you could give your users a choice. Consistency is important in APIs. Also, let's face it, the CAFMarker, if not all the basic chunks, should be versioned and extensible. Sure all that takes a few more bits (well, not the float/integer thing), but it's really nothing compared to the sea of data in most audio files.
  • In the SMTPE timecode types they define kCAF_SMPTE_TimeType30Drop. Now, the fact is that there's really no such thing as 30 Drop, but I can see an argument for including it out of completeness. However, the documentation states that: "30 video frames per second, with video-frame-number counts adjusted to ensure that the timecode matches elapsed clock time." Which is wrong. If you actually had 30 Drop it would run ahead of elapsed, or "wall-clock" time. "Aha!" you say, "they really mean 29 Drop, which is often just called 30 Drop because everyone knows there's no such thing as 30 Drop." But, I'm afraid you are wrong, because there's another constant for that, kCAF_SMPTE_TimeType2997Drop, with pretty much the same documentation, only in this case, it's correct to say that the timecode matches elapsed time. (well, it's very close anyway)
So CAF might be flawed, but probably no more so than WAVE and anything built on it. The reliability factor is sweet. Really. The fact that many people, especially in broadcast, seem to be wanting RF64 support is a detraction, though.

Of course, I might just be over-engineering it. The AU format has been around forever, is super simple and provides high resolution, uncompressed audio of ANY length (it's not even limited to 64-bit). Of course, it lacks metadata which might be useful for BWF-style info as well as region data, but hey, it's wicked simple.


An interesting side note is that by choosing an appropriately sized junk/empty chunk in the header, Wave64, RF64 and CAF can actually be converted from one to another in-place.