I want to go over some potentially obvious stuff mainly because I want it to set the table for tomorrow’s post.  You have all heard of the old chestnut where, if you focus clearly, it is often possible to pick out one individual conversation from among the hubbub of a noisy cocktail party.  There may be a hundred people all talking at the same volume.  Together, this forms the noise, and it sounds like we have a hundred times more Noise than the Signal which we are trying to extract from it.  Clearly, the Noise overwhelms the Signal.  Yet most of us have already performed this social experiment, so we know it is not that hard to do.  What, then, is going on here?

To understand this, we need to go back to the concept of noise.  What exactly is Noise, and what makes it different from Signal?  Basically, noise occurs whenever what we are observing appears to be random.  Consider a sequence of random numbers.  What makes them random is that we can discern no pattern or sequence within them, regardless of the level of analytical sophistication, whether real or hypothetical, that we can bring to bear upon them.  If any such pattern can be established, then the numbers are no longer random.  Actually, generating truly random numbers is an astonishingly challenging task, as any expert in cryptography will tell you.  In audio, if the signal – whether an analog signal or a digital representation thereof – is totally random, then it comprises totally noise.

Having said that, there are many different flavours of random.  For example, we can generate a sequence of random numbers that lie between 0 and 1.  Or between -10 and +10.  The other interesting thing is that we can generate random numbers where all the different numbers do not actually have the same chance of appearing.  But is that random, you ask?  Yes it is, and here is an experiment you can do yourself.  Toss two coins (we will assume this to be a truly random process).  Repeat this as often as you like and make a tally of the outcomes.  Two heads or two tails will each appear about a quarter of the time.  But the combination of a head and a tail will appear about half the time.  In audio, the equivalent is what we call Noise Colours.  The noise signal itself may be random, but its frequency content can have any distribution that we like.  For example, White Noise has equal components at all frequencies, whereas Pink Noise has fewer components the higher the frequency goes.

A signal at a certain frequency can only be separated from the noise if its magnitude is higher than the magnitude of the fraction of the noise which is at that frequency.  Lets go back to the cocktail party.  A hundred people are talking, all at the same volume.  But you are only interested your boss, who is talking to the company chairman.  You can hear him discussing his thoughts on the new Vice President, an appointment that everyone expects to be announced soon.  Your boss’s voice is like one frequency component in an audio spectrum, where all the other peoples’ voices represent other frequency components.  By concentrating only on your boss’s frequency, you can tune out all the other frequencies and listen in on his conversation.  Provided, that is, his voice stays above the residual background noise at that frequency.  So, finally the boss leans forward, and lowering his voice, tells the chairman who the new Vice President will be.  But – dammit it all! – by lowering his voice, he has reduced it below the level of the residual background noise.  And you can no longer make out what he says.  But at least there is a lesson to take away.  When the signal drops below the overall noise level, it is still possible to recover it.  But when it drops below the level of that component of the noise which is at the frequency of the signal, then it is irretrievably lost.  If the signal is fainter than the noise, it simply means that what you are listening to is indistinguishable from being random.  Your only option is to change the way you measure the signal.

So how do we know what the level of the signal is at a particular frequency, and how do we know what the background noise is?  The mathematical tool we use to analyze the frequency content of a signal is the Fourier Transform.  It is called a Transform, because the original audio data is transformed into something that bears no immediately obvious resemblance to it, and yet contains all of the information necessary to enable it to be transformed back into the exact original data.  If you want to see what the math looks like, look it up on Wikipedia!  The Fourier Transform of an audio signal turns out to be a representation of the frequency content of the audio signal.  It is a mathematically exact representation.  If there is any frequency information that cannot be precisely extracted from the Fourier Transform, this is simply because that information does not actually exist in the original signal.  Conversely, if you see something in the Fourier Transform, then, whether you like it or not, that means it is also in the original signal.

Taking our noisy cocktail party analogy, we can see what is necessary for us to identify a signal within a noisy environment.  We have to strip everything away that we can identify as not being part of the bit of the signal we are interested in, and focus just on those aspects of the data that could actually be the signal.  Provided we limit our thinking to the frequency domain, we can think this through quite nicely.  Within the data, we will be able to identify the presence of a signal at a certain frequency, but only if the magnitude of the signal is higher than magnitude of all of the noise that is within a narrow band of frequencies surrounding the one we are looking for.  And we can use a Fourier Transform to see whether that is in fact the case.

It sounds like a whole load of stuff and nonsense, but tomorrow we’ll look at a practical example.

I want to go over some potentially obvious stuff mainly because I want it to set the table for tomorrow’s post.  You have all heard of the old chestnut where, if you focus clearly, it is often possible to pick out one individual conversation from among the hubbub of a noisy cocktail party.  There may be a hundred people all talking at the same volume.  Together, this forms the noise, and it sounds like we have a hundred times more Noise than the Signal which we are trying to extract from it.  Clearly, the Noise overwhelms the Signal.  Yet most of us have already performed this social experiment, so we know it is not that hard to do.  What, then, is going on here?

To understand this, we need to go back to the concept of noise.  What exactly is Noise, and what makes it different from Signal?  Basically, noise occurs whenever what we are observing appears to be random.  Consider a sequence of random numbers.  What makes them random is that we can discern no pattern or sequence within them, regardless of the level of analytical sophistication, whether real or hypothetical, that we can bring to bear upon them.  If any such pattern can be established, then the numbers are no longer random.  Actually, generating truly random numbers is an astonishingly challenging task, as any expert in cryptography will tell you.  In audio, if the signal – whether an analog signal or a digital representation thereof – is totally random, then it comprises totally noise.

Having said that, there are many different flavours of random.  For example, we can generate a sequence of random numbers that lie between 0 and 1.  Or between -10 and +10.  The other interesting thing is that we can generate random numbers where all the different numbers do not actually have the same chance of appearing.  But is that random, you ask?  Yes it is, and here is an experiment you can do yourself.  Toss two coins (we will assume this to be a truly random process).  Repeat this as often as you like and make a tally of the outcomes.  Two heads or two tails will each appear about a quarter of the time.  But the combination of a head and a tail will appear about half the time.  In audio, the equivalent is what we call Noise Colours.  The noise signal itself may be random, but its frequency content can have any distribution that we like.  For example, White Noise has equal components at all frequencies, whereas Pink Noise has fewer components the higher the frequency goes.

A signal at a certain frequency can only be separated from the noise if its magnitude is higher than the magnitude of the fraction of the noise which is at that frequency.  Lets go back to the cocktail party.  A hundred people are talking, all at the same volume.  But you are only interested your boss, who is talking to the company chairman.  You can hear him discussing his thoughts on the new Vice President, an appointment that everyone expects to be announced soon.  Your boss’s voice is like one frequency component in an audio spectrum, where all the other peoples’ voices represent other frequency components.  By concentrating only on your boss’s frequency, you can tune out all the other frequencies and listen in on his conversation.  Provided, that is, his voice stays above the residual background noise at that frequency.  So, finally the boss leans forward, and lowering his voice, tells the chairman who the new Vice President will be.  But – dammit it all! – by lowering his voice, he has reduced it below the level of the residual background noise.  And you can no longer make out what he says.  But at least there is a lesson to take away.  When the signal drops below the overall noise level, it is still possible to recover it.  But when it drops below the level of that component of the noise which is at the frequency of the signal, then it is irretrievably lost.  If the signal is fainter than the noise, it simply means that what you are listening to is indistinguishable from being random.  Your only option is to change the way you measure the signal.

So how do we know what the level of the signal is at a particular frequency, and how do we know what the background noise is?  The mathematical tool we use to analyze the frequency content of a signal is the Fourier Transform.  It is called a Transform, because the original audio data is transformed into something that bears no immediately obvious resemblance to it, and yet contains all of the information necessary to enable it to be transformed back into the exact original data.  If you want to see what the math looks like, look it up on Wikipedia!  The Fourier Transform of an audio signal turns out to be a representation of the frequency content of the audio signal.  It is a mathematically exact representation.  If there is any frequency information that cannot be precisely extracted from the Fourier Transform, this is simply because that information does not actually exist in the original signal.  Conversely, if you see something in the Fourier Transform, then, whether you like it or not, that means it is also in the original signal.

Taking our noisy cocktail party analogy, we can see what is necessary for us to identify a signal within a noisy environment.  We have to strip everything away that we can identify as not being part of the bit of the signal we are interested in, and focus just on those aspects of the data that could actually be the signal.  Provided we limit our thinking to the frequency domain, we can think this through quite nicely.  Within the data, we will be able to identify the presence of a signal at a certain frequency, but only if the magnitude of the signal is higher than magnitude of all of the noise that is within a narrow band of frequencies surrounding the one we are looking for.  And we can use a Fourier Transform to see whether that is in fact the case.

It sounds like a whole load of stuff and nonsense, but tomorrow we’ll look at a practical example.

THIS ITEM HAS NOW BEEN SOLD.  🙂

We are selling our Stello U3 USB-to-SPDIF converter.  We bought it for our evaluation bench a little over 18 months ago.  We have been using the U3 whenever we evaluate a new DAC which supports a more limited range of sample rates over USB than over S/PDIF.  These days, there is no reason for anybody to do that any more.

The Stello U3 has proven itself to be at least the equal of anything similar we have evaluated at up to three times the price, and in conjunction with DAC hardware up to and including the Light Harmonic Da Vinci.  If you have a legacy DAC which is designed to give its best performance over a AES/EBU or Coaxial S/PDIF interface, you should be using a device such as the Stello U3.  The Stello connects to the Computer’s USB output, and delivers the audio signal via your choice of AES/EBU or Coaxial S/PDIF.

Our U3 unit is in mint condition.  It only comes out of its box when needed for testing purposes.  We are offering it for sale in its original packaging for US\$300, plus shipping.  (We are located in Montreal, Canada, so you can estimate shipping costs accordingly.)

If you are interested, please e-mail me.

THIS ITEM HAS NOW BEEN SOLD.  🙂

We are selling our Stello U3 USB-to-SPDIF converter.  We bought it for our evaluation bench a little over 18 months ago.  We have been using the U3 whenever we evaluate a new DAC which supports a more limited range of sample rates over USB than over S/PDIF.  These days, there is no reason for anybody to do that any more.

The Stello U3 has proven itself to be at least the equal of anything similar we have evaluated at up to three times the price, and in conjunction with DAC hardware up to and including the Light Harmonic Da Vinci.  If you have a legacy DAC which is designed to give its best performance over a AES/EBU or Coaxial S/PDIF interface, you should be using a device such as the Stello U3.  The Stello connects to the Computer’s USB output, and delivers the audio signal via your choice of AES/EBU or Coaxial S/PDIF.

Our U3 unit is in mint condition.  It only comes out of its box when needed for testing purposes.  We are offering it for sale in its original packaging for US\$300, plus shipping.  (We are located in Montreal, Canada, so you can estimate shipping costs accordingly.)

If you are interested, please e-mail me.

Many digital audio Apps – BitPerfect included – have the ability to apply dither to the audio signal.  Many of those Apps provide a great deal of control over the type of dither employed, and at what stage it is added.  BitPerfect does not.  The reason is that when dither is applied it needs to applied for very good reasons, in circumstances that indicate a requirement for dither, and using an appropriate choice of dithering algorithm.  BitPerfect allows you to choose between two types of dither – an unidentified algorithm provided by CoreAudio, and a BitPerfect-implemented TPDF (Triangular Probability Density Function) dither.  BitPerfect then decides when – and if – this dither should be applied.  It is done this way because experience shows very clearly that users of those Apps which offer in-depth user control over dithering, routinely exercise that control unwisely.

So here is a brief tutorial on dither.  What it is, why we do it, and how it works.

Dither has its roots in dropping bombs during WWII.  Elaborate mechanical contraptions were devised to enable the bombardiers to aim their bombs as accurately as possible when dropping them from bombers while getting shot at.  In the safe confines of the engineering lab, the engineers could not get these devices to work with sufficient accuracy.  But wartime needs being expedient, they were installed into bombers anyway and pressed into service, where, much to the surprise of the engineers, they proved to be far more accurate than expected.  It turned out that the elaborate mechanisms were rather “sticky” in operation, and when installed on a bomber, the immense constant vibration jogged the mechanisms out of their “sticky” positions and caused them to function properly.  Those of you who still like to knock on an old analog meter before taking a reading are doing exactly the same thing.  Back in the lab, the engineers mounted the bombing aids onto a vibrating table, and all of a sudden were able to replicate the excellent in-service performance.  They termed this forced vibration “dither”.

Dither is now a well-used art in digital signal processing, in video as well as audio.  I am going to focus on one particular aspect – the most important of the audio applications.  When a signal is digitized (and I will use the term ‘quantized’ from here on in), it is in effect assigned a very specific value, which, as often as not, is a measure of the amplitude of a voltage.  Because quantization means assigning the magnitude of the voltage to one of a limited number of fixed levels, it is inevitably the case that there is some residual error between the actual value of the voltage and the stored quantized value.  This error is called the quantization error, and what is typically done is to choose the quantization level which is closest to the actual value of the voltage, thereby minimizing the quantization error.  Are you with me so far?

It turns out that minimizing the quantization error is not necessarily the best way to go.  This method produces a quantization error signal that correlates quite well with the original signal.  In plain English, this means that the quantization error signal looks more like distortion than it does noise.  And we know that the human ear is far more tolerant to noise than it is to distortion.  But lets stop to think about this.  Done this way, the quantization error has a magnitude which is always less than one half of the magnitude of the least significant but.  So it will only correlate with the original signal (and therefore produce distortion) if the original signal is sufficiently clean that when looked at with a magnifying glass that sees all the way down to the level of the least significant bit, the signal contains no additional noise.  But if the original signal does contain noise, and if the noise is of a magnitude that swamps the least significant bit, then the quantization error can only correlate with the noise and not with the signal, and the resultant quantization error signal will only comprise noise and no distortion.  Still with me?

So, if the original signal is clean and contains no noise, all we need to do is add some noise of our own, and any distortion components present in the quantization error signal will be replaced by noise components.  Although the magnitude of the noise we need to add turns out to be larger than the magnitude of the original distortion components, this noise turns out to be more pleasing on the ear.  A lot more pleasing, actually.  This added noise is what we call dither.

There are actually many different types of noise.  We want to add the best type of noise for the particular circumstances, and this is where it can get a lot more complicated.  BitPerfect uses TPDF (Triangular Probability Density Function) noise.  This type of noise has been shown mathematically to maximally suppress quantization error distortion with the minimum amount of added noise.  Other types of noise have other properties.  One of the most interesting is “Noise Shaped” noise.  This type of noise is more complicated, and in order to work properly has to be added within a frequency-sensitive feedback loop.  It has the interesting property that the added noise is actively shaped away from one portion of the frequency range (where the ear is most sensitive) and into another (where the ear is less sensitive).  Surprisingly, in the right circumstances, noise shaped dither is capable of suppressing the SNR to a level below the notional theoretical limit imposed by the bit depth (approx 6dB per bit).

It is important to appreciate that dithering the signal adds noise to it – noise that was not there before, and which can never be removed again afterwards.  If you dither an already-dithered music data stream, you will only be adding further noise to the already-added noise which is not normally of much benefit – in fact it will usually degrade the signal.  In particular, adding noise-shaped dither to a data stream that has already received noise-shaped dither can raise the noise to quite unpleasant levels at higher frequencies.  Many CDs are mastered with a final application of noise-shaped dither, so if you have ripped one of these, you don’t really want to be applying more noise-shaped dither when it comes to playback.  Unfortunately it takes a suite of Analytical DSP Apps to determine this, and even those of us who do have these Apps generally cannot be bothered with doing it on any sort of routine basis!

I will comment specifically on two scenarios which will be of relevance to BitPerfect users.  Sample Rate Conversion and Digital Volume Control.

In BitPerfect’s implementation of SRC, the 16-bit or 24-bit integer data is first transcoded to a 64-bit Float format.  SRC comprises some heavy mathematics operating on the 64-bit Float data, at the end of which we have a bunch of 64-bit Float numbers that need to be converted back to integers again.  64-bit Float numbers are stored with 48 bits of precision, and to convert them back to 16-bit (or 24-bit) integers we have to throw away the least significant 32 bits (or 24 bits, respectively) of data.  The difference between the new 16-bit (or 24-bit) value and the original 48-bit precision becomes the new quantization error.  So it is wise to apply some dither.  Easiest to do would be to choose TPDF, and this would be a good choice.  With a 16-bit output format, there is some potential benefit to be had in applying noise-shaped dither, but you would need to be confident that no further noise-shaped dither is being applied downstream.  With 24-bit data, there is a very fair argument to be made that it does not need any dithering at all, since nothing below the 22nd bit is ever audible anyway.  But dithering 24-bit data can’t hurt either way.

With digital volume control, we are usually only talking about digital attenuation.  Some DACs can provide an amount of digital gain, but these are relatively few, and digital gain is not normally of interest to audiophiles, so I will ignore it here.  Digital attenuation effectively reduces the bit depth of the music data.  Every 6dB of attenuation loses you one bit of resolution.  So 24dB of attenuation loses you 4 bits of data.  Those lost bits of data drop off the bottom end, into the digital void below the LSB, and so any dither present in the signal gets lost in the process.  It is therefore advisable to re-dither the signal after performing volume control.  TPDF dither would again be a good choice here, but this could also be an ideal place to introduce an appropriate noise-shaped dither function.

In BitPerfect, we apply dither after all SRC operations, using either TPDF or CoreAudio according to the user selection, but we do not yet dither after volume control.  This is to do with limitations on the way we have written our audio engine, but version 1.1 of BitPerfect will include a completely new audio engine that can perform real-time dithering on the volume control.

So spare a thought for those valiant WWII bombardiers.  All this was furthest from their minds!

Many digital audio Apps – BitPerfect included – have the ability to apply dither to the audio signal.  Many of those Apps provide a great deal of control over the type of dither employed, and at what stage it is added.  BitPerfect does not.  The reason is that when dither is applied it needs to applied for very good reasons, in circumstances that indicate a requirement for dither, and using an appropriate choice of dithering algorithm.  BitPerfect allows you to choose between two types of dither – an unidentified algorithm provided by CoreAudio, and a BitPerfect-implemented TPDF (Triangular Probability Density Function) dither.  BitPerfect then decides when – and if – this dither should be applied.  It is done this way because experience shows very clearly that users of those Apps which offer in-depth user control over dithering, routinely exercise that control unwisely.

So here is a brief tutorial on dither.  What it is, why we do it, and how it works.

Dither has its roots in dropping bombs during WWII.  Elaborate mechanical contraptions were devised to enable the bombardiers to aim their bombs as accurately as possible when dropping them from bombers while getting shot at.  In the safe confines of the engineering lab, the engineers could not get these devices to work with sufficient accuracy.  But wartime needs being expedient, they were installed into bombers anyway and pressed into service, where, much to the surprise of the engineers, they proved to be far more accurate than expected.  It turned out that the elaborate mechanisms were rather “sticky” in operation, and when installed on a bomber, the immense constant vibration jogged the mechanisms out of their “sticky” positions and caused them to function properly.  Those of you who still like to knock on an old analog meter before taking a reading are doing exactly the same thing.  Back in the lab, the engineers mounted the bombing aids onto a vibrating table, and all of a sudden were able to replicate the excellent in-service performance.  They termed this forced vibration “dither”.

Dither is now a well-used art in digital signal processing, in video as well as audio.  I am going to focus on one particular aspect – the most important of the audio applications.  When a signal is digitized (and I will use the term ‘quantized’ from here on in), it is in effect assigned a very specific value, which, as often as not, is a measure of the amplitude of a voltage.  Because quantization means assigning the magnitude of the voltage to one of a limited number of fixed levels, it is inevitably the case that there is some residual error between the actual value of the voltage and the stored quantized value.  This error is called the quantization error, and what is typically done is to choose the quantization level which is closest to the actual value of the voltage, thereby minimizing the quantization error.  Are you with me so far?

It turns out that minimizing the quantization error is not necessarily the best way to go.  This method produces a quantization error signal that correlates quite well with the original signal.  In plain English, this means that the quantization error signal looks more like distortion than it does noise.  And we know that the human ear is far more tolerant to noise than it is to distortion.  But lets stop to think about this.  Done this way, the quantization error has a magnitude which is always less than one half of the magnitude of the least significant but.  So it will only correlate with the original signal (and therefore produce distortion) if the original signal is sufficiently clean that when looked at with a magnifying glass that sees all the way down to the level of the least significant bit, the signal contains no additional noise.  But if the original signal does contain noise, and if the noise is of a magnitude that swamps the least significant bit, then the quantization error can only correlate with the noise and not with the signal, and the resultant quantization error signal will only comprise noise and no distortion.  Still with me?

So, if the original signal is clean and contains no noise, all we need to do is add some noise of our own, and any distortion components present in the quantization error signal will be replaced by noise components.  Although the magnitude of the noise we need to add turns out to be larger than the magnitude of the original distortion components, this noise turns out to be more pleasing on the ear.  A lot more pleasing, actually.  This added noise is what we call dither.

There are actually many different types of noise.  We want to add the best type of noise for the particular circumstances, and this is where it can get a lot more complicated.  BitPerfect uses TPDF (Triangular Probability Density Function) noise.  This type of noise has been shown mathematically to maximally suppress quantization error distortion with the minimum amount of added noise.  Other types of noise have other properties.  One of the most interesting is “Noise Shaped” noise.  This type of noise is more complicated, and in order to work properly has to be added within a frequency-sensitive feedback loop.  It has the interesting property that the added noise is actively shaped away from one portion of the frequency range (where the ear is most sensitive) and into another (where the ear is less sensitive).  Surprisingly, in the right circumstances, noise shaped dither is capable of suppressing the SNR to a level below the notional theoretical limit imposed by the bit depth (approx 6dB per bit).

It is important to appreciate that dithering the signal adds noise to it – noise that was not there before, and which can never be removed again afterwards.  If you dither an already-dithered music data stream, you will only be adding further noise to the already-added noise which is not normally of much benefit – in fact it will usually degrade the signal.  In particular, adding noise-shaped dither to a data stream that has already received noise-shaped dither can raise the noise to quite unpleasant levels at higher frequencies.  Many CDs are mastered with a final application of noise-shaped dither, so if you have ripped one of these, you don’t really want to be applying more noise-shaped dither when it comes to playback.  Unfortunately it takes a suite of Analytical DSP Apps to determine this, and even those of us who do have these Apps generally cannot be bothered with doing it on any sort of routine basis!

I will comment specifically on two scenarios which will be of relevance to BitPerfect users.  Sample Rate Conversion and Digital Volume Control.

In BitPerfect’s implementation of SRC, the 16-bit or 24-bit integer data is first transcoded to a 64-bit Float format.  SRC comprises some heavy mathematics operating on the 64-bit Float data, at the end of which we have a bunch of 64-bit Float numbers that need to be converted back to integers again.  64-bit Float numbers are stored with 48 bits of precision, and to convert them back to 16-bit (or 24-bit) integers we have to throw away the least significant 32 bits (or 24 bits, respectively) of data.  The difference between the new 16-bit (or 24-bit) value and the original 48-bit precision becomes the new quantization error.  So it is wise to apply some dither.  Easiest to do would be to choose TPDF, and this would be a good choice.  With a 16-bit output format, there is some potential benefit to be had in applying noise-shaped dither, but you would need to be confident that no further noise-shaped dither is being applied downstream.  With 24-bit data, there is a very fair argument to be made that it does not need any dithering at all, since nothing below the 22nd bit is ever audible anyway.  But dithering 24-bit data can’t hurt either way.

With digital volume control, we are usually only talking about digital attenuation.  Some DACs can provide an amount of digital gain, but these are relatively few, and digital gain is not normally of interest to audiophiles, so I will ignore it here.  Digital attenuation effectively reduces the bit depth of the music data.  Every 6dB of attenuation loses you one bit of resolution.  So 24dB of attenuation loses you 4 bits of data.  Those lost bits of data drop off the bottom end, into the digital void below the LSB, and so any dither present in the signal gets lost in the process.  It is therefore advisable to re-dither the signal after performing volume control.  TPDF dither would again be a good choice here, but this could also be an ideal place to introduce an appropriate noise-shaped dither function.

In BitPerfect, we apply dither after all SRC operations, using either TPDF or CoreAudio according to the user selection, but we do not yet dither after volume control.  This is to do with limitations on the way we have written our audio engine, but version 1.1 of BitPerfect will include a completely new audio engine that can perform real-time dithering on the volume control.

So spare a thought for those valiant WWII bombardiers.  All this was furthest from their minds!

When you want to pull the wool over someone’s eyes, the easiest way to do it is to employ indisputable facts, and present them in such a way as to enable the listener to draw an inappropriate conclusion, usually with the aid of some logical leger-de-main.  Sometimes they won’t bite.  So you move on to myths.  You invoke something that has been repeated so often that, through plain and simple brainwashing, it has come to be regarded as fact even though it has no factual basis to support it.  Sometimes even the myth fails.  So you have little choice but to move onto misconceptions.  Misconceptions are a tricky animal, because they usually have their origins in indisputable facts.  What you do is take a fact, tack something onto it that is not itself supported by facts, and hope to get away with passing it off as a fact by association.  And if all else fails, you just have to lie.

Most of the time deceptions are manifestations of plain and simple dishonesty.  But sometimes misconceptions themselves worm their way into our knowledge base in such a way that we forget where the lines lie between the underlying facts and the remora-like add-ons that attached themselves like suckers.  To create a misconception you need three simple ingredients.  First, you need a subject matter which is inherently complicated, but which can be easily described in simple terms so that people can feel comfortable with it to a certain level.  Next, you need something which is actually incorrect.  Third, you need a logical link – an argument or demonstration by which the flawed conclusion becomes conflated with the factual aspects of the subject matter.

There are many misconceptions that plague the complex world of digital audio.  I am going to try and clarify one of them for you, because you will hear it repeated ad nauseam.  It is the one where you will be told that by careful application of dither, you can extract signals whose amplitude lies below the LSB (least significant bit), and which, apparently, cannot otherwise be encoded.

I want to introduce you first to the concept of image enhancement.  Many of you will have come across a scene in a movie or TV show where a blurred image is magically sharpened to incredible resolution with little more than a handful of keystrokes on a computer.  It is utter balderdash.  A misconception.  But behind it lie some real facts.  I have seen real demonstrations of blurry indistinct video of a military nature, where a computer is asked to uncover the presence of, say, a tank.  When the magic button is pressed, a tank does indeed appear out of the murk.  These demonstrations are very compelling and very convincing – and, yes, are factual.  They key element of what is happening is that in order to see a tank, you have to be looking for a tank.  If the murk hides a car, a hot-dog vendor, or even Osama Bin Laden holding a bazooka, you will never see them.  You need to be looking specifically for them.  How the technology works, is that the complex object hiding in the murk of the image disturbs the murk ever so slightly, and by correlating the disturbance with what we know of the appearance of a tank, we can infer to a greater or lesser degree, the presence of the tank.  It is important to note that, so long as we are looking for the tank we will never be able to perceive Osama and his bazooka.  We have to be looking specifically for him.  And his bazooka.

Applying this to audio dither, the same arguments hold true.  Yes, it is possible to take 16-bit audio data and apply the right sort of dither, and then observe a recorded pure tone at -120dB, which is 20dB below the Signal-to-(Quantization)-Noise ratio of the 16-bit format.  That’s a good 4 bits below the level of the 16th bit.  This is because we were looking for that specific pure tone using a Fourier Transform.  Music is NOT being encoded at -120dB.  We are simply inferring the presence of the -120dB tone through its residual interaction with the (dithered) noise.  In fact the residual evidence of the tone was only inserted in the first place during the dithering process.

Here is the true test of whether one can encode music below the 16th bit purely through the magic of dither.  Take an undithered 16-bit recording.  Apply 20dB of digital attenuation using whatever dithering technique you like.  Mathematics says that the 4 least significant bits of the music data will be pushed below the level of the 16th bit where they are simply lost forever.  However, according to the misconception, the 4 least significant bits of the 16-bit data stream have somehow been safely preserved by the dither, all ready to be re-constructed.  Now take the attenuated data stream and apply 20dB of gain.  Have you managed to reconstruct the original music data?  No, you haven’t.  Not even close.

This is not to say that dither is neither useful nor valuable.  The fact is that it CAN reduce the measured SNR to below the theoretical quantization noise limit over a certain range of the audio bandwidth.  This is useful and valuable.  But to demonstrate the extraction of test tones from below that limit is only a mathematical party trick, and nothing more should be inferred from it.

I close with a rhyme, wherein the landlord of a country Inn uses his own form of dither to fit ten men into nine bedrooms:

Ten weary travellers, cold and wet
To a country Inn did come.
The night was cold, their clothes were damp,
Their hands and feet were numb.

“Come in, come in”, the landlord cried,
“A room for all ye men,
“But I have only nine spare beds,
“And ye are numbered ten.”

“Then one of us shall take the floor,
“For none of us are gay!”
“Nay, nay, my friends”, the landlord cried,
“There is an easy way.”

Two men he placed in room marked A,
The third he lodged in B,
The fourth and fifth in C and D,
The sixth in bedroom E.

Seven, eight, nine, in F, G, H,
Then back to A did fly,
Wherein remained the tenth and last,
And lodged him safe in I.

For each of travellers ten.
And this is it that puzzles me,
And many wiser men.

When you want to pull the wool over someone’s eyes, the easiest way to do it is to employ indisputable facts, and present them in such a way as to enable the listener to draw an inappropriate conclusion, usually with the aid of some logical leger-de-main.  Sometimes they won’t bite.  So you move on to myths.  You invoke something that has been repeated so often that, through plain and simple brainwashing, it has come to be regarded as fact even though it has no factual basis to support it.  Sometimes even the myth fails.  So you have little choice but to move onto misconceptions.  Misconceptions are a tricky animal, because they usually have their origins in indisputable facts.  What you do is take a fact, tack something onto it that is not itself supported by facts, and hope to get away with passing it off as a fact by association.  And if all else fails, you just have to lie.

Most of the time deceptions are manifestations of plain and simple dishonesty.  But sometimes misconceptions themselves worm their way into our knowledge base in such a way that we forget where the lines lie between the underlying facts and the remora-like add-ons that attached themselves like suckers.  To create a misconception you need three simple ingredients.  First, you need a subject matter which is inherently complicated, but which can be easily described in simple terms so that people can feel comfortable with it to a certain level.  Next, you need something which is actually incorrect.  Third, you need a logical link – an argument or demonstration by which the flawed conclusion becomes conflated with the factual aspects of the subject matter.

There are many misconceptions that plague the complex world of digital audio.  I am going to try and clarify one of them for you, because you will hear it repeated ad nauseam.  It is the one where you will be told that by careful application of dither, you can extract signals whose amplitude lies below the LSB (least significant bit), and which, apparently, cannot otherwise be encoded.

I want to introduce you first to the concept of image enhancement.  Many of you will have come across a scene in a movie or TV show where a blurred image is magically sharpened to incredible resolution with little more than a handful of keystrokes on a computer.  It is utter balderdash.  A misconception.  But behind it lie some real facts.  I have seen real demonstrations of blurry indistinct video of a military nature, where a computer is asked to uncover the presence of, say, a tank.  When the magic button is pressed, a tank does indeed appear out of the murk.  These demonstrations are very compelling and very convincing – and, yes, are factual.  They key element of what is happening is that in order to see a tank, you have to be looking for a tank.  If the murk hides a car, a hot-dog vendor, or even Osama Bin Laden holding a bazooka, you will never see them.  You need to be looking specifically for them.  How the technology works, is that the complex object hiding in the murk of the image disturbs the murk ever so slightly, and by correlating the disturbance with what we know of the appearance of a tank, we can infer to a greater or lesser degree, the presence of the tank.  It is important to note that, so long as we are looking for the tank we will never be able to perceive Osama and his bazooka.  We have to be looking specifically for him.  And his bazooka.

Applying this to audio dither, the same arguments hold true.  Yes, it is possible to take 16-bit audio data and apply the right sort of dither, and then observe a recorded pure tone at -120dB, which is 20dB below the Signal-to-(Quantization)-Noise ratio of the 16-bit format.  That’s a good 4 bits below the level of the 16th bit.  This is because we were looking for that specific pure tone using a Fourier Transform.  Music is NOT being encoded at -120dB.  We are simply inferring the presence of the -120dB tone through its residual interaction with the (dithered) noise.  In fact the residual evidence of the tone was only inserted in the first place during the dithering process.

Here is the true test of whether one can encode music below the 16th bit purely through the magic of dither.  Take an undithered 16-bit recording.  Apply 20dB of digital attenuation using whatever dithering technique you like.  Mathematics says that the 4 least significant bits of the music data will be pushed below the level of the 16th bit where they are simply lost forever.  However, according to the misconception, the 4 least significant bits of the 16-bit data stream have somehow been safely preserved by the dither, all ready to be re-constructed.  Now take the attenuated data stream and apply 20dB of gain.  Have you managed to reconstruct the original music data?  No, you haven’t.  Not even close.

This is not to say that dither is neither useful nor valuable.  The fact is that it CAN reduce the measured SNR to below the theoretical quantization noise limit over a certain range of the audio bandwidth.  This is useful and valuable.  But to demonstrate the extraction of test tones from below that limit is only a mathematical party trick, and nothing more should be inferred from it.

I close with a rhyme, wherein the landlord of a country Inn uses his own form of dither to fit ten men into nine bedrooms:

Ten weary travellers, cold and wet
To a country Inn did come.
The night was cold, their clothes were damp,
Their hands and feet were numb.

“Come in, come in”, the landlord cried,
“A room for all ye men,
“But I have only nine spare beds,
“And ye are numbered ten.”

“Then one of us shall take the floor,
“For none of us are gay!”
“Nay, nay, my friends”, the landlord cried,
“There is an easy way.”

Two men he placed in room marked A,
The third he lodged in B,
The fourth and fifth in C and D,
The sixth in bedroom E.

Seven, eight, nine, in F, G, H,
Then back to A did fly,
Wherein remained the tenth and last,
And lodged him safe in I.

For each of travellers ten.
And this is it that puzzles me,
And many wiser men.

Not many people can make that proud boast.  But we can, and now you can too, with our new range of BitPerfect merchandise.  Buy a T-shirt, cap, tote bag, hoodie, mug, or even a skin for your iPhone or Samsung Galaxy, and let the world know that your Bits are Perfect too!