There are two uses of the word Compression in the world of digital audio. The first is dynamic compression. This is where we want to increase the volume of a track, but in doing so we make the loudest bits so loud that their signal level is larger than the maximum value the format can encode. Here we would use “dynamic compression” to selectively reduce the gain on those loudest passages so that they fit inside the available headroom. This note is not about dynamic compression. Instead it is all about file compression.
File compression is a process that takes a computer file which takes up a certain number of Megabytes of storage space, and manipulates it so it takes up a lesser number of Megabytes. Ideally, but not necessarily always, this compression is lossless, by which we mean that identical raw data can be extracted from both the original file and the compressed file. There are two reasons for wanting to do this. To reduce the amount of storage space required to store the file, and to reduce the bandwidth required to transmit a file from one place to another within a constrained amount of time.
Most of the time, we find that everyday computer files can be readily compressed. Why is this? In the software world, the format of a file is typically chosen so as to allow the computer to write data to the file, and read data from the file, in an efficient manner. Scant regard is often paid to the resultant efficiency of data storage. An example might be a simple text file. A simple ASCII character set uses only 7 bits to encode it. However, computer files are typically written in chunks of 8-bits, called Bytes. So every time we want to write a character we use up 8 bits of storage when in practice we only needed 7 bits. A simple file compression technique can use this observation to recover the unused storage space and reduce the file size by one eighth. With more complex file structures, a general-purpose strategy is not so obvious. Native music file formats are similarly inefficient.
Anybody who has used a zipping program to make a ZIP file to transmit a file over the Internet will be familiar with lossless compression. A ZIP file is a general-purpose lossless file compression utility. Some files, for example Bitmap (BMP) image files will compress very nicely into much smaller ZIP files. On the other hand, files such a JPG images are very seldom reduced at all in file size by zipping. This is because the file format used for BMP files is particularly inefficient, whereas by contrast the file format for JPG files is highly efficient. In principle, any computer file can be reduced in size by a well-chosen lossless compression utility, unless the file format was specified to be efficiently compressed in the first place.
In general, the more we know about a file, and about the data that the file contains, the more freedom we can have in selecting an optimum strategy to compress it. With music files there are number of attributes that can be exploited to effect lossless compression. Here are two of the easier to describe attributes: (i) Because music files encode a waveform, and because the waveform is not totally random (in which case it would be noise, not music), we can use the waveform’s immediate past to predict what its immediate future might look like, and encode instead the differences between the predictions and the actual values. This is used very effectively in many well-known lossless encoders. (ii) Stereo music, content is dominated by centred images which contain identical information in the right and left channels. If instead of encoding L and R, we encode L+R and L-R we find we end up with waveforms that are more readily susceptible to other compression methodologies.
Despite the effectiveness of these methods, there are still realistic limits on how much a native music file can be compressed without losing data. For most music this averages out at around 50%. To reduce file sizes by more than that, it is necessary to adopt lossy compression features. Lossy is exactly what it says it is. In order to further reduce the file size, we take something that we think you probably can’t hear and we throw it away. Lossy compression makes great use of the findings of the field of psychoacoustics in order to help us decide what, exactly, you ‘probably’ can’t hear. Lossy compression technology is fabulously creative, extremely clever, and very interesting, but for all that it still makes your music sound worse.
MP3 is the granddaddy of lossy audio compression technologies. I do not propose to go into detail about how MP3 does its thing, but at its core it makes use of a key finding of psychoacoustics, that of ‘masking’. Masking states that certain sounds are more effectively masked by some sounds than by others. For example, a louder sound masks a quieter one (well, duh!). Also, a sound at one frequency effectively masks other sounds at adjacent frequencies. So if we we can identify and extract one element of a waveform, and determine that it is ‘masked’ by another one, then we could, for example, encode the ‘masked’ element using a much lower bit depth.
MP3 sets about breaking the music into as many as 572 frequency subbands, the contents of which are then scaled up or down according to the aforementioned psychoacoustic principles, and end up being encoded using a technique called “Huffman Coding”, by which the most commonly-occurring values are encoded using fewer bits than the less-common values (quite simple, yet really rather clever). Using this approach we can, in effect, controllably reduce the resolution of the encoded music, reducing it more for those elements in the music which are ‘masked’, and less for those doing the masking. The Huffman Codes are typically stored in one or more look-up tables, and by choosing an appropriate table we can end up with a larger or smaller effective bit rate.
In effect, lossy compression techniques employ much more in the way of signal processing than lossless compression in order to identify and extract which components can be effectively thrown away while minimizing (note, never eliminating) the audible deterioration in the perceived sound quality. For this reason, more recent encoders such as Apple’s AAC, which are more elaborate and require more processing power than MP3, tend to sound better at equivalent bit rates.