Yesterday, I introduced you to the concept of Jitter, and showed how it has the potential to disrupt the accuracy of digital audio playback.  We saw how the measures necessary to eliminate jitter as a problem can impose unpleasant and challenging constraints upon the designer of an audiophile-grade DAC.  It would be reasonable for us to ask what the audible effects of this jitter actually are, and how we can determine the efficacy with which a DAC design has addressed it.  This post attempts to address these questions.

This analysis is all about jitter as a digital phenomenon, by which we mean that we are concerned only by the notional effect of playing back the wrong signal at the right time (or vice-versa).  We assume that the only effect of jitter is that which we have described here as arising from fanatically precise timing errors.  Can we calculate what audible or measurable effect such timing errors can have?  Lets take a look.

The first thing we have to consider is what we call the distribution of jitter timing errors.  As a trivial example, let us imagine that every single timing point is subject to a jitter-based timing error of 100ns (100 nanoseconds – see yesterday’s post for an explanation).  100ns is a large value, and experiments have suggested that jitter of this magnitude is quite audible.  In our trivial example suppose each and every sample is delayed by exactly 100ns.  What we have in fact accomplished is to simply take perfect playback and delay it by 100ns.  We could achieve exactly the same thing by moving our loudspeakers back by a hair’s breadth.  It certainly won’t affect the sound quality.  So just having 100ns of jitter is not in and of itself enough to cause a problem.  What we need to see is an uncertainty or variability in the precise timing of the sampling process, in other words the individual samples end up being off by unknown amounts which could average out to 100ns.  In this case, some samples are delayed, some are advanced, and some are not affected.  Some of these errors are large, and some are small.  We don’t know which are which.  All we know is that, on average, the errors amount to some 100ns.  In other words, there is a “distribution” of timing errors.  Clearly, it is possible to imagine how the audible effects of such a collection of timing errors might be affected by exactly how these errors are distributed.

In order to analyze what the effects of jitter might be, we have to classify those effects according to different types of distributions – the way in which the precise jitter values vary from one sample to the next.  The best way to begin is to divide these distributions into two categories – correlated and uncorrelated jitter.  Uncorrelated jitter is the easiest to understand.  The jitter value for any sample is totally random.  It is like rolling a dice – there is absolutely no way to predict what any one given jitter value will be.  As we will see, uncorrelated jitter is the easiest form of jitter to analyze.  All other forms of jitter are, by definition, correlated jitter.  The jitter values correlate to some degree or another with some other property.  Correlation does not necessarily mean that its exact value is determinable.  It can be more like rolling a loaded dice.  Some values end up being more likely than others.  Analysis of correlated jitter is way more challenging.

Uncorrelated (or random) jitter turns out to be very similar to dithering, which I treated in an earlier post.  Uncorrelated jitter introduces a random error into the value of each sample.  The effect of this is quite simple – it increases the noise floor.  The analysis is slightly complicated by the fact that the amount of noise is dependent on the frequency spectrum of the signal, but in general this analysis shows that noise floor increases of 10dB and more across the entire audio spectrum can obtained with as little as 1ns of uncorrelated jitter.  However, as with dithering, this noise may not be perceptible, as it may lie below the noise floor introduced by other (analog) components in the audio playback chain.  Qualitatively, uncorrelated jitter is not normally considered to be a significant detriment to the sound quality.

Correlated (or deterministic) jitter is a more complicated beast.  Correlated jitter may correlate with a number of factors, including the audio signal, the power supply (mains frequency and its harmonics), clock circuits, external sources of RF interference, and other factors which may be very difficult to pin down.  Its frequency spectrum and bandwidth need to be taken into account.  If the jitter behaves in a tightly deterministic manner, we can perform some very accurate mathematical analysis of its behaviour to determine its effect on the audio signal, but deviations from even the simplest forms of deterministic jitter make the analysis and its interpretation exponentially more difficult.

Lets take the simplest case of an audio signal comprising a single pure tone, and a jitter function which behaves as a pure sinusoid.  A simple Fourier Transform of the resultant audio signal will be found to exhibit the single peak of the original audio pure tone, plus two symmetrical side lobes.  The magnitude and separation of the side lobes will permit us to calculate both the frequency and magnitude of the jitter signal.  This is a highly specific and limiting case, and it is highly unlikely that any real-world jitter scenario would ever be that simple.  But for the most part it is all we have!

At this point, I would normally go into a little bit more detail on how real-world jitter measurements are performed, but it really is too complicated.  Suffice to say that it typically depends on Fourier Analysis of an audio signal with a single tone, in some cases further modulated by a very low-level lower-frequency square wave which produces a family of reference harmonics, based on the analysis I described above.  The technique involves looking for pairs of symmetrical side lobes and attempting to infer the corresponding jitter contributions.  As you can see, this type of analysis will fail to take into account distributions of jitter which are not properly described by the (highly simplified) underlying mathematical model, and its accuracy will be limited by the validity of the model, which, as I have observed, gets waaaay more difficult to interpret as the modeled system gets more complicated.  The net effect is that these more elaborate analyses are limited by the assumptions that have to be made in order the make the math more manageable, and the accuracy of the results is in the end limited by the validity of the assumptions.  The disconnect between the two is a very real problem – compounded by the fact that the person using the analysis tool is not normally familiar with the underlying mathematics, nor the assumptions upon which it rests.

The audibility of these jitter modes are far more difficult to predict, even with the assistance of the limited mathematical modelling.  Unlike the results with uncorrelated (random) jitter, correlated jitter often results in specific frequency peaks.  This type of behaviour is more like distortion than it is noise, and we know that the human ear tends to be far less tolerant of distortions than noise, with some distortions (such as intermodulation distortion) being much worse than others (such as even harmonic distortion).  At this point, it is not possible to entirely dismiss the notion that some classes of jitter may be both impossible to observe and measure, yet at the same time deleterious to sound quality.

But is jitter really what I have described, and does it really impact the system in the way I have described it?  Or could something else be in play?  Tomorrow, in the final installment of this short series, I will start to consider this idea further.