Monthly Archives: November 2013

Last week, we learned that by adopting a PCM format, we also constrain ourselves with the need to employ radical low-pass filtering in both the ADC and DAC stages in order to eliminate the fundamental problem of aliasing. Yesterday we learned that we can use oversampling and noise shaping to overcome some of the limitations imposed by Bit Depth in PCM systems. Taking both together, we learned that by increasing both the BitDepth and the Sample Rate we can make inroads into the audible effects of both of these limitations.

In practice, there is no point in extending the Bit Depth beyond 24 bits. This represents a dynamic range of 144dB, and no recording system we know of can can present waveforms with that level of dynamic range to the input of an ADC. On the other hand, even by extending the Sampling Rate out to 384kHz (the largest at which I have even seen any commercially available music made available), the brick-wall filter requirements are still within the territory where we would anticipate its effects to be audible. A 24/384 file is approximately 13 times the size of its 16/44.1 equivalent. That gets to be an awfully big file. In order for the filter requirements to be ameliorated to the point where we are no longer concerned with their sonic impact the sample rate needs to be out in the MHz range. But a 24-bit 2.82MHz file would be a whopping 100 times the size of its 16/44.1 counterpart. Clearly this is takes us places we don’t want to go.

But wait! Didn’t we just learn that by oversampling and Noise Shaping we can access dynamic range below the limitation imposed by the Bit Depth? Increasing the sample rate by a factor of 64 to 2.82MHz would mean that our audio frequencies (20Hz – 20kHz) are all going to be massively oversampled. Perhaps we can reduce the Bit Depth? Well, with oversampling alone, all we can do is shave a paltry 3-bits off our Bit depth. But do not get discouraged, with Noise Shaping it turns out we can reduce it all the way down to 1-bit. A 1-bit 2.82MHz file is only 4 times larger than its 16/44.1 equivalent, which is actually quite manageable. But really? Can we get more than 100dB of dynamic range from a 1-bit system just by sampling at 2.82MHz?

Yes, we can, but I am not going anywhere near the mathematics that spit out those numbers. That is the preserve of experts only. But here’s what we do. When we encode data with a 1-bit number, the quantization error is absolutely massive, and can be anywhere between +100% and -100% of the signal itself. Without any form of noise shaping, this quantization noise would in practice sit at a level of around -20dB (due to the effect of oversampling alone) but would extend all the way out to a frequency of 1.41MHz. But because of the massive amount of oversampling, we can attempt to use Noise Shaping to depress the quantization noise in the region of 0-20kHz, at the expense of increasing it at frequencies above, say 100kHz. In other words, we would “shape” it out of the audio band and up into the frequency range where we are confident no musical information lives, and plan on filtering it out later. We didn’t choose that sampling rate of 2.82MHz by accident. It turns out that is the lowest sample rate at which we can get the noise down well below 100dB over the entire audio frequency bandwidth.

To convert this signal back to analog, it turns out this format is much easier to implement than multi-bit PCM. Because we only encode 1-bit, we only have to create an output voltage of either Maximum or Minimum. We are not concerned with generating seriously accurate intermediate voltages. To generate this output, all we have to do is switch back and forth between Maximum and Minimum according to the bit stream. This switching can be done very accurately indeed. Then, having generated this binary waveform, all we have to do is pass it through a very simple, slow roll-off, low pass filter, the sort that won’t give your audio circuit designer indigestion. Job done.

This is a pretty interesting result. We have managed to eliminate the need for those nasty brick-wall filters at both the ACD and DAC, and at the same time capture a signal with exceptional dynamic range across the audio bandwidth. This, my friends, is DSD.

As with a lot of things, when you peek under the hood, things always get a little more complicated, and I will address some of those complications tomorrow.

Being strictly accurate, DSD (Direct Stream Digital) is a term coined by Sony and Phillips, and refers to a very specific audio protocol. It is a 1-bit Sigma-Delta Modulated data stream encoded at a sample rate of 2.8224MHz. However, the term has now been arbitrarily widened by the audio community at large, to the point where we find it employed to apply generically to an ever-widening family of Sigma-Delta Modulated audio data streams. We read the terms Double-DSD, Quadruple DSD, and “DSD-Wide” applied to various SDM-based audio formats, so that DSD has become a catch-all term somewhat like PCM. There are many flavours of it, and some are claimed to be better than others.

So, time to take a closer look at DSD in its broadest sense, and hopefully wrap some order around the confusion.

Strangely enough, the best place to start is via a detour into the topic of dither which I discussed a couple of weeks back. You will recall how I showed that a 16-bit audio signal with a maximum dynamic range of 96dB can, when appropriately dithered, be shown using Fourier Analysis to have a noise floor that can be as low as -120dB. I dismissed that as a digital party trick, which in that context it is. But this time it is apropos to elaborate on that.

The question is, can I actually encode a waveform that has an amplitude below -96dB using 16-bit data? Yes I can, but only if I take advantage of a process called “oversampling”. Oversampling works a bit like an opinion poll. If I ask your opinion on whether Joe or Fred will win the next election, your response may be right or may be wrong, but it has limited value as a predictor of outcome. However, if I ask 10,000 people, their collective opinions may prove to be a more reliable measure. What I have done in asking 10,000 people is to “oversample” the problem. The more people I poll, the more accurate the outcome should be. Additionally, instead of just predicting that Joe will win (sorry, Fred), I start to be able to predict exactly how many points he will win by, even though my pollster never even asked that question in the first place!

In digital audio, you will recall that I showed how an audio signal needs to be sampled at a frequency which is at least twice the highest frequency in the audio signal. I can, of course, sample it at any frequency higher than that. Sampling at a higher frequency than is strictly necessary is called “oversampling”. There is a corollary to this. All frequencies in the audio signal that are lower than the highest frequency are therefore inherently being oversampled. The lowest frequencies are being oversampled the most, and highest frequencies the least. Oversampling gives me “information space” I can use to encode a “sub-dynamic” (my term) signal. Here’s how…

At this point I wrote and then deleted three very dense and dry paragraphs which described, and illustrated with examples, the mathematics of how oversampling works. But I had to simplify it too much to make it readable, in which form it was too easy to misinterpret, so they had to go. Instead, I will somewhat bluntly present the end result: The higher the oversampling rate, the deeper we can go below the theoretical PCM limit. More precisely, each time we double the sample rate, we can encode an additional 3dB of dynamic range. But there’s no free lunch to be had. Simple information theory says we can’t encode something below the level of the Least Significant Bit (LSB), and yet that’s what we appear to have done. The extra “information” must be encoded elsewhere in the data, and it is. In this case it is encoded as high levels of harmonic distortion. The harmonic distortion is the mathematical price we pay for encoding our “sub-dynamic” signal. This is a specific example of a more general mathematical consequence, which says that if we use the magic of oversampling to encode signals below the level of the LSB, other signals – think of them as aliases if you like – are going to appear at higher frequencies, and there is nothing we can do about that.

Let’s go back again to dither, and consider a technique – Noise Shaping – that I mentioned in a previous post. Noise shaping relates to the fact that when we quantize a signal in a digital representation, the resultant quantization error looks like a noise signal added to the waveform. What is the spectrum of this noise signal? It turns out that we have a significant level of control over what it can look like. At lower frequencies we can squeeze that noise down to levels way below the value of the LSB, lower even than can be achieved by oversampling alone, but at the expense of huge amounts of additional noise popping up at higher frequencies. That high-frequency noise is the “aliases” of the sub-dynamic “low frequency” information that our Noise Shaping has encoded – even if that low frequency information is silence(!). This is what we mean by Noise Shaping – we “shape” the quantization noise so that it is lower at low frequencies and higher at high frequencies. For CD audio, those high frequencies must all be within the audio frequency range, and as a consequence, you have to be very careful in deciding where and when (and even whether) to use it, and what “shape” you want to employ. Remember – no free lunch.

But if we increase the sample rate we also increase the high frequency space above the limit of audibility. Perhaps we can use it as a place to park all that “shaped” high-frequency noise? Tomorrow, we’ll find out.

I have been using iTunes 11.1.3 all day without encountering any problems.  It should be fine for BitPerfect users to download and install it.

Today we are testing the latest iTunes update 11.1.3 for compatibility with BitPerfect.  I will post my findings later in the day.

Back in late March, I posted some introductory comments here regarding how DACs actually function.  Anyway, following my recent posts on Sample Rate I thought it might be apropos to revisit that subject.

Today’s DACs, with a few very rare (and expensive) exceptions, all use a process called Sigma Delta Modulation (SDM, sometimes also written DSM) to generate their output signal.  A nice way to look at SDM DACs is to visualize them as upsampling their output to a massively high frequency – usually 64, 128 or 256 times 44.1kHz – and taking advantage of the ability to use a very benign analog filter at the output.  That is a gross over-simplification, but for the purposes of the point I am trying to make today, it is good enough.

Doing such high-order up-conversion utilizes a great deal of processing power, and providing that processing power adds cost.  Additionally, the manufacturers of the most commonly used DAC chipsets are very coy about their internal architectures, and don’t disclose their approaches.  I would go so far as to say that some DAC manufacturers actually misunderstand how the DAC chipsets which they buy actually work, and publish misleading information (I have to assume this is not done intentionally) about how their product functions.  Much of this centres around cavalier usage of the terms ‘upsampling’ and ‘oversampling’.  Finally, some DAC manufacturers use DAC chipsets with prodigious on-chip DSP capability (such as the mighty ESS Sabre 9018), and then fail to make full use of it in their implementations.

Let’s study a hypothetical example.  We’ll take a 44.1kHz audio stream that our DAC chip needs to upsample by a factor of 64 to 2.88MHz, before passing it through its SDM.  The best way to do this would be using a no-holds-barred high-performance sample rate converter.  However, there are some quite simple alternatives, the simplest of which would be to just repeat each of the original 44.1kHz samples 64 times until the next sample comes along.  What this does is to encode the “stairstep” representation of digital audio we often have in mind, in fine detail.  This is acceptable, because, in truth, the 44.1kHz audio steam does not contain one jot of additional information.  Personally, I would refer to this as oversampling rather than upsampling, but you cannot rely on DAC manufacturers doing likewise.

If we are going to use this approach, though, it leads us down a certain path.  It results in the accurate recreation of the stairstep waveform at the output of the DAC.  Even though we have oversampled by a factor of 64 in our SDM process, the output of our DAC has been a faithful reproduction of a 44.1kHz sampled waveform.  This waveform, therefore needs to go through an analog brick-wall filter to strip out the aliases which are embedded within the stairstep.  This is exactly as we discussed in my last post on Sample Rates.

In principle, therefore, by upsampling (using proper Sample Rate Conversion) our 44.1kHz audio by a factor of 2 or 4 prior to sending it to the DAC, we can avail ourselves of the possibility that the DAC can instead implement a less aggressive, and better-sounding, brick-wall filter at its output.  That would be nice.  But that is not the way many (and maybe even most) DACs that use this approach are built.  Instead, they use the same brick-wall filter at high sample rates as they do at 44.1kHz.  If your DAC does this you would not expect to hear anything at all in the way of sonic improvement by asking BitPerfect (or whatever other audio software you use) to upsample for you.

So let’s go back a couple of paragraphs, and instead of our DAC oversampling the incoming 44.1kHz waveform, suppose it actually upsamples it using a high quality SRC algorithm.  Bear in mind that all of the audio content up to 20kHz in a 44.1kHz audio stream is aliased within the frequency band from 24.1kHz to 44.1kHz.  If we are to upsample this, we should really strip the aliases out using a digital brick-wall filter.  Done this way, the result is a clean signal that we can pass into the SDM, and which is precisely regenerated, without the stairstep, at the DAC’s output.  So we no longer need that analog brick-wall filter.

Let’s take another look at these last two scenarios.  One had an analog brick-wall filter at the output,but the other had essentially the same brick-wall filter implemented digitally at an intermediate processing stage.  If the two sound at all different, it can only be because the two filters sounded different.  Is this possible?  In fact, yes it is, and there are two reasons for that.  The first, as I mentioned in a previous post, is that an analog filter has sonic characteristics which derive from both its design, and from the sonic characteristics of the components from which it is constructed.  The digital equivalent – IF (a big IF) properly implemented – only has sonic consequences arising from its design.  There is a further point, which is that digital filters can be designed to have certain characteristics which their analog counterparts cannot, but that fact serves only as a distraction here.  The bottom line here is that, if properly designed, a diligent DAC designer ought to be able to achieve better sound with this ‘upsampling’ approach than with the previously discussed ‘oversampling’ approach (again, I must emphasize this is MY usage of those terminologies, and is not necessarily everybody else’s).

Using the ‘upsampling’ approach I have just described, it should once again make little difference whether you send your music to the DAC at its native sample rate, or if you upsample it first using BitPerfect (or whatever).  However, all this assumes that the upsampling algorithm used by the DAC is at least as good as the one used by BitPerfect.  There is no guarantee that this will be so, in which case you may find that you get improved results by using BitPerfect to upsample for you to the maximum supported by your DAC.  And you should use one of the SoX upsampling algorithms provided by BitPerfect, rather than CoreAudio.

The bottom line here is that you should expect your DAC to sound better (or at least as good) with your music sent to it at its native sample rate than with it upsampled by BitPerfect.  And if it doesn’t, the difference is probably down to BitPerfect’s upsampling algorithm sounding better than the one implemented in your DAC’s DSP firmware.

So, in summary, in light of all the above, our recommendation here at BitPerfect is that you do NOT use BitPerfect to upsample for you, unless you have conducted some extensive listening tests and determined that upsampling sounds better in your system.  These tests should include serious auditioning of BitPerfect’s three SoX algorithms.

I confess to having a weakness for good food and good wine, as well as good sound.  I enjoy cooking, and am not too bad at it, although I offer no pretensions to being any sort of chef.  So it is not surprising that I also get a kick out of watching the TV show Masterchef.

If you don’t know the premise of the show, it goes like this.  Two dozen of the best (amateur) home cooks in America are set cooking challenges by three top celebrity chefs, headed up by the fearsome Gordon Ramsey.  Each week one of them gets eliminated.  At the end of it all, the surviving chef wins the big prize.  The thing is, the challenges these amateur chefs get set are quite mind-blowingly difficult, and in addition they have to compete under very serious time pressures.  Watching the show, I always find myself thinking that the best 24 professional chefs in the country – and certainly ANY of the contestants in Ramsey’s companion show “Hell’s Kitchen” – would find the competition no less challenging.

So, to my astonishment, the producers at Masterchef came up with the notion of Masterchef Junior, where the same format would instead be opened to the 24 best chefs in America, but this time in the age range 10-13 years old.  The challenges faced by these junior chefs would be no less formidable than those faced by the adults.

Here’s the thing, though.  If you had pitched that idea to me before I had seen the show, I would have laughed and said that Masterchef Junior would be of marginal interest, and then only to Soccer Moms.  The reality turned out to be rather different.

Instead we were treated to the sight of 10-year-olds cooking stunningly (and I mean stunningly – things I couldn’t begin to imagine taking on) complex foods, with no preparation, under the very same pressure-cooker time constraints, and held accountable to the same unyielding standards, as their adult counterparts.  It blows my mind.  Imagine being dined in the most expensive restaurant you know of, having a great meal, and being introduced to the chef only to find that he or she is still at elementary school.  And, cynical as I am regarding the so-called “unscripted” nature of TV reality shows, I find it hard to believe that all of this is not very real.

I happen to believe that the current generation of children growing up in North America is doing so with the greatest sense of entitlement of any generation that has ever lived, coupled with the least intention of developing the skills necessary to make good on those expectations.

That said, I now know that there are at least 24 kids out there who, in whatever direction their lives and careers will eventually take them, have truly enormous – dare I say unlimited – potential.

Masterchef Junior

In yesterday’s post we found ourselves wondering whether a high-rez recording needs to expand its high frequency limit beyond 20kHz, and whether squeezing a brick-wall filter into the gap between 20kHz and 22.05kHz is really that good of an idea.  Today we will look at what we might be able to do about those things.

First, lets ignore the extension of the audio bandwidth above 20kHz and look at the simple expedient of doubling the sample rate from 44.1kHz to 88.2kHz.  Our Nyquist Frequency will now go up from 22.05kHz to 44.1kHz.  Two things are going to happen, which are quite interesting.  To understand these we must look back at the two brick-wall filters we introduced yesterday, one protecting the A-to-D converter (ADC) from receiving input signals above the Nyquist Frequency, and the other protecting the output of the D-to-A converter (DAC) from generating aliased components of the audio signal at frequencies above the Nyquist Frequency.  They were, to all intent and purpose, identical filters.  In reality, not so, and at double the sample rate it becomes evident that they have slightly different jobs to do.

We start by looking at the filter protecting the input to the ADC.  That filter still has to provide no attenuation at all at 20kHz and below, but now the 96dB attenuation it must provide need only happen at 44.1kHz and above.  That requirement used to be 22.05kHz and above.  The distance between the highest signal frequency and the Nyquist Frequency (the roll-over band) is now over 10 times wider than it was before!  That is a big improvement.  But let’s not get carried away by that – it is still a significant filter, one having a roll-off rate of nearly 100dB per octave.  By comparison, a simple RC filter has a roll-off rate of only 6dB per octave.

Now we’ll look at the filter that removes the aliasing components from the output of the DAC.  Those components are aliases of the signal frequencies that are all below 20kHz.  As described in Part I, those aliases will be generated within the band of frequencies that lies between 68.2kHz and 88.2kHz.  If there is no signal above 20kHz, then there will be no aliasing components below 68.2kHz.  Therefore the requirements for the DAC’s anti-aliasing filter are a tad easier still.  We still need our brick wall filter to be flat below 20kHz, but now it can afford to roll over more slowly, and only needs to reach 96dB at 68.2kHz.

Doubling the sample rate yet again gives us more of the same.  The sample rate is now 176.4kHz and its Nyquist Frequency is 88.2kHz.  The DAC filter does not need to roll off until 156.4kHz!  These filters are significantly more benign.  In fact, you can argue that since the aliasing components will all be above 156.4kHz they will be completely inaudible anyway – and might not in fact even be reproducible by your loudspeakers!  Some DAC designs therefore do away entirely with the anti-aliasing filters when the sample rate is high enough.

You can keep on increasing the sample rate, and make further similar gains.

Obviously, the higher sample rates also give us the option of encoding audio signals with a correspondingly higher bandwidth.  Equally obviously, that advantage comes at the expense of some of the filter gains, which vanish completely once the desired audio frequency bandwidth is extended all the way out to the new Nyquist Frequency.  But even so, by extending the high frequency limit of the audio signal out to 30kHz, little is given up in filter performance, particularly with a sample rate of 176.4kHz.

So far I have only mentioned sample rates which are multiples of 44.1kHz, whereas we know that 96kHz and 192kHz are popular choices also.  From the point of view of the above arguments concerning brick-wall filters, 96kHz vs 88.2kHz (for example) makes no difference whatsoever.  However, there are other factors which come into play when you talk about the 48kHz family of sample rates vs the 44.1kHz family.  These are all related to what we call Sample Rate Conversion (SRC).

If you want to double the sample rate, one simple way to look at it is that you can keep all your original data, and just interpolate one additional data point between each existing data point.  However, if you convert from one sample rate to another which is not a convenient multiple of the first, then very very few – in fact, in some cases none – of the sample points in the original data will coincide with the required sample points for the new data.  Therefore more of the data – and in extreme cases all of the data – has to be interpolated.  Now, don’t get me wrong here.  There is nothing fundamentally wrong with interpolating.  But, without wanting to get overly mathematical, high quality interpolation requires a high quality algorithm, astutely implemented.  It is not too hard to make one of lesser quality, or to take a good one and implement it poorly.

Downconverting – going from a high sample rate to a lower one – is fraught with even more perils.  For example, going from 88.2kHz sample rate to 44.1kHz sounds easy.  We just delete every second data point.  You wouldn’t believe how many people do that, because it is easy.  But by doing so you make a HUGE assumption.  You see, 88.2kHz data has a Nyquist Frequency of 44.1kHz and therefore has the capability to encode signals at any frequency up to 44.1kHz.  However, music with a sample rate of 44.1kHz can only encode signals up to 22.05kHz.  Any signals above this frequency will be irrecoverably aliased down into the audio band.  Therefore, when converting from any sample rate to any lower sample rate, it is necessary to perform brick-wall filtering – this time in the digital domain – to eliminate frequency content above the Nyquist Frequency of the target sample rate.  This makes down-conversion a more challenging task than up-conversion if high quality is paramount.

Time to summarize the salient points regarding sample rates.

1.  Higher sample rates are not fundamentally (i.e mathematically) necessary to encode the best quality sound, but they can ameliorate (or even eliminate) the need for brick-wall filters which can be quite bad for sound quality.

2.  Higher sample rates can encode higher frequencies than lower sample rates.  Emerging studies suggest that human perception may extend to frequencies higher than can be captured by CD’s 44.1kHz sample rate standard.

3.  Chances are that a high sample rate music track was produced by transcoding from an original which may have been at some other sample rate.  There is absolutely no way of knowing what the original was by examining the file, although pointers can be suggestive.

4.  There is no fundamental reason why 96kHz music cannot be as good as 88.2kHz music.  Likewise 192kHz and 176.4kHz.  However, since almost all music is derived from masters using a sample rate which is a multiple of 44.1kHz, if you purchase 24/96 or 24/192 music your hope is that high quality SRC algorithms were used to prepare them.

5.  Try to bear in mind, if your high-res downloads are being offered at 96kHz and 192kHz, it means your music vendor is maybe being run by people who pay more attention to their Marketing department than their Engineering department.  That’s not an infallible rule of thumb, but it’s a reasonable one.  (Incidentally, that is what happened to Blackberry.  It’s why they are close to bankruptcy.)

Music is, fundamentally, a set of variations in air pressure that our ears are able to interpret in a meaningful way.  Making music means creating those air pressure variations from scratch.  Reproducing music means using a representation of these air pressure variations to create as close a facsimile as possible.  For that to work we need a means to store that representation.  Analog systems exist to store them as variations in the magnetic field on a tape, or as variations in the depths of a groove spiralling round a disc or a cylinder.  These systems work, but there are limitations imposed by the physical properties of the tapes, discs, and cylinders, which dictate just how much information can be stored on them.  Digital systems can store the same information as a set of numbers, and, as with analog systems, there are limitations imposed upon digital systems, this time by mathematics rather than by any fundamental physical properties of the digital storage medium.

The idea behind a digital representation of an analog waveform is that you measure the analog waveform on a periodic basis and store the resulting numbers.  It has to be a periodic basis, because when it comes time to re-create the original waveform, you have to know the exact time when the measurement was made as well as the exact value that was measured.  You could choose to store the time value as an additional data point of its own, but that would make for an extremely complicated system.  Instead, we adopt the convention that the measurements are made on an exact regular basis, a certain specific number of times per second.  This is what we call the sample rate, and we call this method of digital representation PCM (Pulse Code Modulation).  The sample rate turns out to have a major impact on how the PCM resultant recording will sound.

Music is generally held to occupy the frequencies between 20Hz and 20kHz.  Very few people can hear frequencies as high as 20kHz, and for pretty well all of us, this upper limit of hearing falls progressively as we age.  But, in general we hold to the idea that to record music faithfully, we need to record all frequencies from 20Hz to 20kHz.  How does this impact the choice of sample rate for a PCM system?  The main consideration here is the well-known Nyquist-Shannon sampling theory, which tells us that in order to make a digital representation of an analog signal of a certain frequency, it is necessary that the sampling rate is at least double that frequency.  It should be noted that this is not an approximation.  It is a mathematical fact.  We call the frequency which is one-half of the sample rate the Nyquist Frequency.  Nyquist-Shannon theory informs us that a PCM system can capture only those frequencies which lie below the Nyquist Frequency.

If music occupies a frequency range which tops out at 20kHz, then in order to represent it faithfully in a digital system Nyquist-Shannon says that we need to sample it at a sample rate no lower than 40kHz.  If that was all there was to it, life would be simple.  But Nyquist-Shannon theory tells some other things too.  What happens if we try to encode a frequency that is above the Nyquist Frequency?  The answer is that it gets encoded very well.  But, unfortunately the result is indistinguishable from what you would get if you instead recorded a certain frequency below the Nyquist Frequency.  If the Nyquist Frequency was 20kHz, then a 21kHz signal would be encoded exactly the same as a 19kHz signal; a 22kHz signal would be the same as a 18kHz signal; a 23kHz signal would be the same as a 17kHz signal; and so on.  This effect is called Aliasing (or Mirroring).  All information existing in the recording above the Nyquist Frequency would be Aliased (or Mirrored) to a corresponding frequency below it.  Such effects are – surprise! – destructive to the sound quality.

The solution to this problem is to pass the analog signal through a low-pass filter whose function is to filter out all the high frequencies.  This is not as simple as it sounds.  In theory you would want a filter that massively attenuates everything above 20kHz and nothing below it.  This type of filter is called a brick-wall filter, for obvious reasons.  The problem is that a real-world brick-wall filter makes the transition from flat to attenuating over a range of frequencies that you might think of as a no-man’s land.  Within the no-man’s land the attenuation of the filter is not high enough to prevent aliasing, and not low enough to avoid audibly affecting the music signal.  Therefore the no-man’s land must occupy a range of frequencies above the maximum frequency of the music content, but below the Nyquist Frequency.  For this reason, the Nyquist Frequency should always be higher than the maximum signal frequency.

It turns out that, making this no-man’s land as small as it can practically be is a question of how we design the brick-wall filter, something I will come back to in a moment.  Anyway, applying this kind of thinking to the case in point, we end up moving the Nyquist frequency up a bit from 20kHz to 22.05kHz.  Recall that the sample rate is twice the Nyquist Frequency.  That puts the sample rate at a familiar number – 44,100 samples per second.  This is the thinking that gave birth to the CD format.

At this point we are still not quite finished with Aliasing.  Recall that signals above the Nyquist Frequency that are encoded into the data stream cannot be distinguished from their Aliased counterparts below the Nyquist Frequency.  The same is true in reverse during playback.  For every frequency the DAC generates below the Nyquist Frequency, it also generates a companion at its Alias frequency.  All those aliases are above the Nyquist Frequency, and we need to filter those out during playback.  This requires another brick wall filter similar to the one we implemented for the recording process.

Summarizing the above, then, all we need is a sample rate a little bit above twice the maximum frequency we want to record, plus a brick-wall filter, and we’re good to go.  Whoa boy!  Not so fast…

There were two assumptions that we made along the way, one overtly, and one covertly by accepting something blindly without questioning it.  The first assumption was that it is acceptable to restrict the frequency content to 20kHz because nobody can hear anything above that.  It turns out this is not quite correct, depending on how pedantic you want to be about defining the word “hear”.  Hot off the press, the latest research has thrown up an interesting result.  Working with subjects who have taken a conventional listening test, and who are clearly unable to discern any audio above 20kHz, scientists have wired their brains up to the latest in scientific instruments, and have shown that their brains do in fact react quite unambiguously to the presence of audio signals at frequencies significantly above 20kHz, which the subjects themselves appear to be blissfully unaware of.  This is ongoing research, so it is too early to draw conclusions as to what this means, but maybe it points to a rationale for extending the bandwidth of our recordings up from 20kHz.  But how far?  30kHz?  50kHz?  We don’t have any answers yet.

The second assumption is less esoteric.  We dismissed the brick wall filters as just another circuit element that we could add at our whim.  We declined to consider what – if anything – their audible contribution might be.  This was not wise.  With 16-bit audio, this filter’s job is to be flat up to 20kHz, and thereafter to roll off rapidly to the point where it is 96dB down by the time it reaches 22.05kHz.  That is one monster mother of a filter, and typically would involve a huge component count including capacitors, inductors, and in many cases op amps, all of which are the sorts of components high-end audio circuit designers go to fantastic lengths to eliminate from their signal paths.  Even with the best conceivable design in the world, where the filter’s frequency response is nice and flat, and its phase response is nice and linear, and it still meets its attenuation requirements, such filters are going to have an audible impact on the signal passing through them.  And you have one of them at each of the A-to-D and D-to-A ends of the digital audio chain.  There is an argument to be made that the sound of CD is not so much the sound of 16-bits and 44.1kHz, as much as the sound of an analog brick-wall filter.

So how do different sample rates help here?  I will discuss this tomorrow in Part II.

In the last two posts I have introduced jitter as a digital phenomenon, explained what it is, what it does, how we can measure it, and discussed whether or not we can hear it. All this was started off by the observation that different “Bit Perfect” audio players can sound different, even though all they are doing is using the same hardware to send the same precise data to the same DAC. We postulated that, in the absence of anything else, the only possible mechanism to account for those difficulties had to be jitter. So we took a long hard look at jitter to see if it fit the bill. And maybe it does, but we didn’t exactly explain how software can affect jitter. Before doing so, we are going to take a fresh look at jitter, this time from an analog perspective.

You see, although it is easy to think of digital data in binary terms (it’s either a one or it’s a zero), that we process at some specific instant in time (defined to the picosecond so we can account for “jitter“) – in reality it is never quite so simple as that. Let’s look at a simple circuit, designed to take a digital input and, at a point in time defined by a clock signal, output a certain analog voltage. For the purposes of understanding the operation of jitter, there are two things happening. First of all, the circuit is monitoring the voltage which represents the clock signal, in order to determine when it is ticking and when it is tocking. And secondly, once the clock is triggered, the output circuitry is applying the desired voltage to the output.

We’ll start with the tick-tock of the clock. A clock signal is typically a square-wave waveform. It oscillates sharply between fully “on” (a high voltage) and fully “off” (a low voltage), and it does so very abruptly. Our circuit measures a “tick” when it detects the signal transitioning from low to high, and a “tock” when it detects a transition from high to low. Tick is generally the point at which the clock signal triggers an event, and the function of Tock is generally to mark the point at which the circuit starts looking again for the next Tick. Digital electronics are very easy for the layman to comprehend at a functional level, because the concepts are very easy to visualize. We can easily imagine a square-wave clock signal, and the nice, clean, rising and falling edges of the Tick and the Tock. At some point most of us have seen nice, helpful diagrams. What’s the problem?

A real-world clock signal has some real-world problems to deal with. First of all, we need to look at those high and low voltages that we are measuring. If we look closely enough, we will see that there is usually a lot of high-frequency noise on them. They are not the nice clean square waves we expected them to be. The impact of all this noise is that it gets harder to guarantee that we can distinguish the “high” voltage from the “low“. It is no use getting it right 99 times out of 100. We need to get it right a million times out of a million. The higher the speed of the clock, the worse this gets. But it is quite easy to fix. If the problem is caused by high frequency noise, all we need to do is to filter it out using a Low-Pass Filter. Let’s do that.

Now two things have happened. First, the trace has indeed thinned out, and we can accurately determine whether the clock voltage is “high” or “low”. But now we see that the transitions between high and low now occur at a more leisurely pace. The exact point of the transition is no longer so clear. There is some uncertainty as to when the “Tick” actually occurs. Because of this filtering, there is a trade-off to be had between being able to detect IF a transition has occurred, and exactly WHEN it occurred. If our Low-Pass Filter rolls over at a frequency just above the clock frequency, we do a great job of filtering the noise but it gets correspondingly less certain WHEN the transition occurs – in other words we have higher jitter. The amount of uncertainty in the WHEN can be approximated as the inverse of the roll-over frequency of the filter. The roll-over frequency is therefore normally the highest we can make it without compromising our ability to be certain about detecting the IF. Therefore, if we need be able to function with a higher roll-over frequency and so reduce the uncertainty in the WHEN, we need to re-design the circuit to reduce the noise in the first place.

The take-away from all this is that the presence of high frequency noise is in itself a jitter-like phenomenon.

One way to design an accurate clock is to run the clock “N” times faster than we actually need, and rather than count every Tick, we count every “N” Ticks. We call this process “clock multiplication“, and we can – in principle – achieve an arbitrarily low jitter on our clock by continuing to increase the clock multiplier. This is, in fact, the way all real-world clocks are built. Any way you do it, though, it gets exponentially more expensive the faster the clock gets, due to an increasingly more arduous noise problem. Wickedly so, in fact. If you are DAC manufacturer, it really is a simple question of how much you want to pay to reduce your master clock jitter!

And it’s not just the clock itself that has to be high speed. For example, any circuit anywhere in your DAC that needs to operate such that events can be synchronized to within 1ns must, by definition, have a bandwidth exceeding 1GHz. That, dear reader, is one heck of a high bandwidth. Not only does your clock circuitry have to have GHz bandwidth, so does the converter circuitry which is communicating with it to synchronize its measly 44.1kHz operation with your GHz clock. Otherwise – in principle at least (because nothing is ever so black and white) – you would be wasting the expense you went to in generating your super-clock in the first place. In any case, it becomes a given that a real-world DAC will be constructed with electronics having a bandwidth which – if not necessarily in the exotic territory of GHz – will still be much higher than the sample rates of the conversions it is tasked to perform.

Designing a circuit with RF (Radio Frequency) bandwidth, and having it exhibit good immunity from RF noise, is generally a no-win prospect. When dealing with RF, every problem you solve here results in a new one popping up there. RF circuits are inherently sensitive to RF interference, and so you need to design them – and package them in such a way as to make them immune from the effects of external RF.

External RF is everywhere. It has to be, otherwise your cell phone wouldn’t work. In a circuit, RF does not just flow neatly along wires from one component to the next. It also radiates from the wires and from the components, all the time. And it is not just RF circuits that generate RF. Your house wiring is rife with RF. Just about every electrical appliance you own – all the way down to the dimmer switches on your lights – emits RF. My central heating just turned on – sending a massive RF spike throughout the house, not to mention through the electrical mains line to my neighbour’s house. As a DAC designer, you can do your job diligently to protect yourself from all this, and at least minimize the ability of RF to sneak into your circuit from the surroundings. But you can’t do much about the RF that sneaks in through connections that are designed to transmit RF bandwidths in the first place! Such as the USB and S/PDIF connectors through which your music data enters the DAC.

A USB connector is by design a high bandwidth port. The USB spec calls for it to be so. Within its defined bandwidth, any RF noise injected at one end will be faithfully transmitted down the cable and out the other end. And, in your case, straight into your DAC. This will be so, even if you don’t need all that bandwidth to transmit the music data. The USB connection is indeed a very noisy environment. This is because, in order to just transmit raw data, you are fully prepared to sacrifice timing accuracy (the WHEN) for data accuracy (the IF). Therefore, so long as the amount of spurious RF interference injected into the link is not so much as to compromise the data transmission, the intended performance of the USB link is being fully met. So, if the internals of a computer are generating a lot of spurious RF, there is good reason to imagine that a lot of it is going to be transmitted to any device attached to it via a USB cable. Such as your DAC.

What are the sources of spurious RF inside a computer? For a start, every last chip, from the CPU down, is a powerful RF source. And the harder these chips work – the more cycles they run – the more RF they will emit. Disk drives generate lots of RF noise, as do displays, ethernet ports, bluetooth and WiFi adapters.

So it is not too much of a stretch to imagine that playback software which uses more CPU, more HD access, makes things happen on the display, and communicates over your networks, … it is not too much of a stretch to see how those things have the potential to impact the sound being played on the DAC connected to the computer. Not through jitter, but through RF interference, whose effects we now see can be hard to distinguish from those of jitter.

This, we believe, is the area in which BitPerfect does its thing. Through trial and error, we have established what sort of software activities result in sonic degradation, and which ones don’t. It doesn’t mean we have the exact mechanism nailed down, but it does mean that we have at least the basics of a handle on the cause-and-effect relationships.