Here is something that a lot of people either don’t know or didn’t realize.  How is silence encoded in DSD?

You may remember reading my post a while back called Two’s Complement.  This is where I introduced the concept of the Signed Integer.  We allow the most significant bit of a N-bit number to encode whether the number is positive or negative, and the remaining N-1 bits encode the magnitude.  Positive numbers go from zero upwards, and negative numbers from -1 downwards.  The key point here is that we have an unambiguous representation of zero.

But DSD has only one bit.  If we assign that bit as the sign, then there are no bits left to assign the magnitude.  Instead a value of “1” represents the highest possible voltage (nominally +1V) and a value of “0” represents the lowest possible voltage (nominally -1V).  Zero volts is smack dab in between.

You recall that a convenient way to view DSD is to think of the signal being the average value of a sequence of consecutive bits in the DSD bitstream.  Therefore, anything can be used to represent zero, so long as it averages out to have the same number of ones as zeros.  And that’s more or less right, but we have to understand what that means in practice, so that, if the need should arise (which it does, as we shall see shortly), we understand how to best represent zero.

The simplest representation sounds like it ought to be 10101010.  Since DSD is a 2.8MHz 1-bit data stream, this particular sequence actually encodes a 1.4MHz signal at maximum volume.  It seems very bizarre that to encode silence, we have do this by instead encoding the highest possible frequency at the highest possible volume.  Bizarre, but true.  It works because when this bitstream gets to the DAC, the required low-pass filter will attenuate the 1.4MHz component out of existence.  We could also use a sequence like 10110010 which works just as well.  In fact it is arguably slightly better because the high frequency content is slightly lower in amplitude, although it is spread out over more frequencies.  This is a choice you get to make – whether to encode silence as a high level signal at a frequency furthest away from the audio band, or as a range of lower level signals at frequencies a little closer to the audio band.  There is no single right answer.

I said this question comes up as a practical matter, and indeed it does.  The specification for the DSD file format chooses to break it up into chunks of about 4kB each, and does not allow for smaller chunks.  However, a DSD bitstream can be of arbitrary length, and so if it is to be encoded into the approved file format, it needs some extra signal to be appended to the end of it to bring the last chunk up to its required 4kB size.  Obviously, this extra padding needs to be silence.  But which specific representation of DSD silence does the DSD file format specification tell us to pad it out with?  The answer – quite incredibly – is with zeros.  It is quite specific about that.  But 00000000 does not encode silence in the DSD world.  It encodes full scale negative voltage.  In fact, for reasons I won’t elaborate here, it is worse than that – it encodes a negative voltage which is deep into clipping.  When you play this back, the result is not silence, but **BANG!!!**.  Yes – crazy but true – the specification for the DSD file format calls for every track to be padded out with a digital signal which could propel your tweeter dome across the room!

Do we need to be alarmed?  Not really.  By now, this problem is well understood, and so playback software such as BitPerfect recognizes these undesirable **BANG!!!** signals and replaces them with something that properly encodes zero.  Also, for the most part, I exaggerate for humorous effect, to get my point across.  But the reality exists that there are certain speaker designs out there which could be very expensively damaged by “correctly” coded DSD silence which is not properly corrected by the DSD playback software.

Some DSD content producers do the decent thing and prepare DSD files that correct the problem at the source using proper DSD silence, and are therefore, strictly speaking, out of compliance with the specifications.  My friend Cookie Marenco of Blue Coast Records is particularly conscientious in this regard.  Others continue to produce “BANG-encoded” files in strict adherence to the specs.

You want to know the really sad part about this?  The solution is really very simple.  All we need is to issue a revision to the DSD file specification which corrects this problem by simply specifying the preferred digital bitstream that should be employed for the purposes of padding silence.  It is trivial beyond belief.  It requires a 30-second edit to the file spec.  Job done.  But the people who “own” the file spec have no interest in making this happen.  Their position is that it is not a problem because all the player software corrects for it.

These are the people who spent a fortune developing SACD and then botched the launch.  Ah well … at least your tweeter dome won’t end up in your coffee mug if you use BitPerfect!