Yesterday, I introduced you to the concept of Jitter, and showed how it has the potential to disrupt the accuracy of digital audio playback.  We saw how the measures necessary to eliminate jitter as a problem can impose unpleasant and challenging constraints upon the designer of an audiophile-grade DAC.  It would be reasonable for us to ask what the audible effects of this jitter actually are, and how we can determine the efficacy with which a DAC design has addressed it.  This post attempts to address these questions.

This analysis is all about jitter as a digital phenomenon, by which we mean that we are concerned only by the notional effect of playing back the wrong signal at the right time (or vice-versa).  We assume that the only effect of jitter is that which we have described here as arising from fanatically precise timing errors.  Can we calculate what audible or measurable effect such timing errors can have?  Lets take a look.

The first thing we have to consider is what we call the distribution of jitter timing errors.  As a trivial example, let us imagine that every single timing point is subject to a jitter-based timing error of 100ns (100 nanoseconds – see yesterday’s post for an explanation).  100ns is a large value, and experiments have suggested that jitter of this magnitude is quite audible.  In our trivial example suppose each and every sample is delayed by exactly 100ns.  What we have in fact accomplished is to simply take perfect playback and delay it by 100ns.  We could achieve exactly the same thing by moving our loudspeakers back by a hair’s breadth.  It certainly won’t affect the sound quality.  So just having 100ns of jitter is not in and of itself enough to cause a problem.  What we need to see is an uncertainty or variability in the precise timing of the sampling process, in other words the individual samples end up being off by unknown amounts which could average out to 100ns.  In this case, some samples are delayed, some are advanced, and some are not affected.  Some of these errors are large, and some are small.  We don’t know which are which.  All we know is that, on average, the errors amount to some 100ns.  In other words, there is a “distribution” of timing errors.  Clearly, it is possible to imagine how the audible effects of such a collection of timing errors might be affected by exactly how these errors are distributed.

In order to analyze what the effects of jitter might be, we have to classify those effects according to different types of distributions – the way in which the precise jitter values vary from one sample to the next.  The best way to begin is to divide these distributions into two categories – correlated and uncorrelated jitter.  Uncorrelated jitter is the easiest to understand.  The jitter value for any sample is totally random.  It is like rolling a dice – there is absolutely no way to predict what any one given jitter value will be.  As we will see, uncorrelated jitter is the easiest form of jitter to analyze.  All other forms of jitter are, by definition, correlated jitter.  The jitter values correlate to some degree or another with some other property.  Correlation does not necessarily mean that its exact value is determinable.  It can be more like rolling a loaded dice.  Some values end up being more likely than others.  Analysis of correlated jitter is way more challenging.

Uncorrelated (or random) jitter turns out to be very similar to dithering, which I treated in an earlier post.  Uncorrelated jitter introduces a random error into the value of each sample.  The effect of this is quite simple – it increases the noise floor.  The analysis is slightly complicated by the fact that the amount of noise is dependent on the frequency spectrum of the signal, but in general this analysis shows that noise floor increases of 10dB and more across the entire audio spectrum can obtained with as little as 1ns of uncorrelated jitter.  However, as with dithering, this noise may not be perceptible, as it may lie below the noise floor introduced by other (analog) components in the audio playback chain.  Qualitatively, uncorrelated jitter is not normally considered to be a significant detriment to the sound quality.

Correlated (or deterministic) jitter is a more complicated beast.  Correlated jitter may correlate with a number of factors, including the audio signal, the power supply (mains frequency and its harmonics), clock circuits, external sources of RF interference, and other factors which may be very difficult to pin down.  Its frequency spectrum and bandwidth need to be taken into account.  If the jitter behaves in a tightly deterministic manner, we can perform some very accurate mathematical analysis of its behaviour to determine its effect on the audio signal, but deviations from even the simplest forms of deterministic jitter make the analysis and its interpretation exponentially more difficult.

Lets take the simplest case of an audio signal comprising a single pure tone, and a jitter function which behaves as a pure sinusoid.  A simple Fourier Transform of the resultant audio signal will be found to exhibit the single peak of the original audio pure tone, plus two symmetrical side lobes.  The magnitude and separation of the side lobes will permit us to calculate both the frequency and magnitude of the jitter signal.  This is a highly specific and limiting case, and it is highly unlikely that any real-world jitter scenario would ever be that simple.  But for the most part it is all we have!

At this point, I would normally go into a little bit more detail on how real-world jitter measurements are performed, but it really is too complicated.  Suffice to say that it typically depends on Fourier Analysis of an audio signal with a single tone, in some cases further modulated by a very low-level lower-frequency square wave which produces a family of reference harmonics, based on the analysis I described above.  The technique involves looking for pairs of symmetrical side lobes and attempting to infer the corresponding jitter contributions.  As you can see, this type of analysis will fail to take into account distributions of jitter which are not properly described by the (highly simplified) underlying mathematical model, and its accuracy will be limited by the validity of the model, which, as I have observed, gets waaaay more difficult to interpret as the modeled system gets more complicated.  The net effect is that these more elaborate analyses are limited by the assumptions that have to be made in order the make the math more manageable, and the accuracy of the results is in the end limited by the validity of the assumptions.  The disconnect between the two is a very real problem – compounded by the fact that the person using the analysis tool is not normally familiar with the underlying mathematics, nor the assumptions upon which it rests.

The audibility of these jitter modes are far more difficult to predict, even with the assistance of the limited mathematical modelling.  Unlike the results with uncorrelated (random) jitter, correlated jitter often results in specific frequency peaks.  This type of behaviour is more like distortion than it is noise, and we know that the human ear tends to be far less tolerant of distortions than noise, with some distortions (such as intermodulation distortion) being much worse than others (such as even harmonic distortion).  At this point, it is not possible to entirely dismiss the notion that some classes of jitter may be both impossible to observe and measure, yet at the same time deleterious to sound quality.

But is jitter really what I have described, and does it really impact the system in the way I have described it?  Or could something else be in play?  Tomorrow, in the final installment of this short series, I will start to consider this idea further.

I want to address a critical phenomenon for which there isn’t an adequate explanation, and provide a rationale for it in terms of another phenomenon for which there isn’t an adequate explanation.  Pointless, perhaps, but it is the sort of thing that tends to keep me up at nights.  Maybe some of you too!

Most of you, being BitPerfect Users, will already know that while BitPerfect achieves “Bit Perfect” playback (when configured to do so), so can iTunes (although configuring it can be a real pain in the a\$\$).  Yet, I am sure you will agree, they manage to sound different.  Other “Bit Perfect” software players also manage to sound different.  Moreover, BitPerfect has various settings within its “Bit Perfect” repertoire – such as Integer Mode – which can make a significant difference by themselves.  What is the basis for this unexpected phenomenon?

First of all, we must address the “Flat Earth” crowd who will insist that there cannot possibly BE any difference, and that if you say you can hear one, you must be imagining it.  You can spot them a mile away.  They will invoke the dreaded “double-blind test” at the drop of a hat, even though few of them actually understand the purpose and rationale behind a double-blind test, and have neither organized nor ever participated in one.  I tried to set up a series of publicly-accessible double-blind tests at SSI 2012 with the assistance of a national laboratory’s audio science group.  They couldn’t have shown less interest if I proposed to infect them with anthrax.  Audio professionals generally won’t touch a double-blind test with a ten foot pole.  Anyway, as far as the Flat Earth crowd are concerned, this post, and those that follow, are all about discussing something that doesn’t exist.  Unfortunately, I cannot make the Flat Earthers vanish simply by taking the position that they don’t exist!

For the rest of you – BitPerfect Users, plus anyone else who might end up reading this – the effect is real enough, and a suitable explanation would definitely be in order.  That is, if we had one for you.

If it is not the data itself (because the data is “Bit Perfect“), then we must look elsewhere.  But before we do, some of you will ask “How do we know that the data really is Bit Perfect?“, which is a perfectly reasonable question.  But it is not one I am going to dwell on here, except to say that it has been thoroughly shaken down.  Using USB it is actually quite easy to do (from the perspective of not being technically challenging), although using S/PDIF requires an investment in very specific test equipment.  Bottom line, though, is that this has been done and nobody holds any lingering concerns over it.  I won’t address it further.

As Sherlock Holmes might observe, once we accept that the data is indeed “Bit Perfect“, the only thing that is left is a phenomenon most of us have heard of, but few of us understand – jitter.  Jitter was first introduced to audiophiles in the very early 1990’s as an explanation for why so many people professed a dislike for the CD sound.  Digital audio comprises a bunch of numbers that represent the amplitude of a musical waveform, measured (“sampled” is the term we use) many thousands of times per second.  Some simple mathematical theorems can tell us how often we need to sample the waveform, and how accurately we need those sample measurements to be, in order to achieve specific objectives.  Those theorems led the developers of the CD to select a sample rate of 44,100 times per second, and a measurement precision of 16-bits.  We can play back the recorded sound by using those numbers – one every 1/44100th of a second – to regenerate the musical waveform.  This where jitter comes in.  Jitter reflects a critical core fact – “The Right Number At The Wrong Time Is The Wrong Number“.

Jitter affects both recording and playback, and only those two stages.  Unfortunately, once it has been embedded into the recording you can’t do anything about it, so we tend to think of it only in terms of playback.  But I am going to describe it in terms of recording, because it is easier to grasp that way.

Imagine a theoretically perfect digital audio recorder recording in the CD format.  It is measuring the musical waveform 44,100 times a second.  That’s one datapoint every 23 microseconds (23 millionths of a second).  At each instant in time it has to measure the magnitude of the waveform, and store the result as a 16-bit number.  Then it waits another 23 microseconds and does it again.  And again, and again, and again.  Naturally, the musical waveform is constantly changing.  Now imagine that the recorder by mistake measures the reading a smidgeon too early or too late.  It will measure the waveform at the wrong time.  The result will not be the same as it would have been if it had been measured at the right time, even though when the measurement was taken, it was taken accurately.  We have measured the right number at the wrong time, and as a result it is the wrong number.  When it comes time to playback, all the DAC knows is that the readings were taken 44,100 times a second.  It has no way of knowing whether any individual readings were taken a smidgeon too early or too late.  A perfect DAC would therefore replay the wrong number at the right time, and as a result it will create a “wrong” waveform.  These timing errors – these smidgeons of time – are what we describe as “Jitter“.  Playback jitter is an identical problem.  If the replay timing in an imperfect real-world DAC is off by a smidgeon, then the “right” incoming number will be replayed at the “wrong” time, and the result will likewise be a wrong waveform.

Just how much jitter is too much?  Lets examine a 16-bit, 44.1kHz recording.  Such a recording will be bandwidth limited theoretically to 22.05kHz (practically, to a lower value).  We need to know how quickly the musical waveform could be changing between successive measurements.  The most rapid changes generally occur when the signal comprises the highest possible frequency, modulated at the highest possible amplitude.  Under these circumstances, the waveform can change from maximum to minimum between adjacent samples.  A “right” number becomes a “wrong” number when the error exceeds the precision with which we can record it.  A number represented by a 16-bit integer can take on one of 65,536 possible values.  So, a 16-bit number which changes from maximum to minimum between adjacent samples, cycles through 65,536 distinct values between samples.  Therefore, in this admittedly worst-case scenario, we will record the “wrong” number if our “smidgeon of time” exceeds 1/65535 of the time between samples, which you will recall was 23 millionths of a second.  That puts the value of our smidgeon at 346 millionths of a millionth of a second.  In engineering-speak that is 346ps (346 picoseconds).  That’s a very, very short time indeed.  In 346ps, light travels 4 inches.  And a speeding bullet will traverse 1/300 of the diameter of a human hair.

I have just described jitter in terms of recording, but the exact same conditions apply during playback, and the calculations are exactly the same.  If you want to guarantee that jitter will not affect CD playback, it has to be reduced to less that 346ps.  However, in the real world, there are thing we can take into account to alleviate that requirement.  For example, real-world signals do not typically encode components at the highest frequencies at the highest levels, and there are various sensible theories as to how to better define our worst-case scenario.  I won’t go into any of them.  There are also published results of real-world tests which purport to show that for CD playback, jitter levels below 10ns (ten nanoseconds; a nanosecond is a thousand picoseconds) are inaudible.  But these tests are 20 years old now, and many audiophiles take issue with them.  Additionally, there are arguments that higher-resolution formats, such as 24-bit 96kHz, have correspondingly tighter jitter requirements.  Lets just say that it is generally taken to be desirable to get jitter down below 1ns.

If you require the electronics inside your DAC to deliver timing precision somewhere between 10ns and 346ps, this implies that those electronics must have a bandwidth of somewhere from 100MHz to 3GHz.  That is RF (Radio Frequency) territory, and we will come back to it again later.  Any electronics engineer will tell you that electrical circuits stop behaving sensibly, logically and rationally once you start playing around in the RF.  The higher the bandwidth, the more painful the headaches.  Electronics designer who work in the RF are in general a breed apart from those who work in the AF (the Audio Frequency band).

The bottom line here is that digital playback is a lot more complicated than just getting the exact right bits to the DAC.  They have to be played back with a timing precision which invokes unholy design constraints.

Tomorrow I will talk about the audible and measurable effects of jitter.

Chamber Music. Even the term itself is enough to put people off. It is a genre which many people file under the same folder as waterboarding. And in truth, on occasion it does feel like it belongs there.

There is one chamber work, though, which I would encourage anybody for whom music is – in whatever form – an important part of your life, to take some time aside to sit down and listen to. Arguably the greatest chamber work ever written, Schubert’s String Quintet D956, composed only two months before his untimely death from syphilis, aged only 31. Listen to this in a dark room, on headphones, accompanied by a glass of your finest single malt scotch, having secured iron-clad assurances, on pain of death, that under no circumstances will you be disturbed. This is music that entwines itself with your very soul, poses questions you cannot answer, and satisfies longings you never knew you craved.

A String Quartet is a standard musical ensemble, comprising two violins, a viola and a cello. A String Quintet, on the other hand, is a more flexible designation – the fifth player is usually another viola, but in this case a second cello is called for. Two cellos would suggest a sonic imbalanced in the bass, but in the expert hands of Franz Schubert it instead adds an almost symphonic depth to the soundscape. A great performance can make you think you are listening to a chamber orchestra. Performances of D956 fall into two categories. Because of the stature of the piece, it is often performed by an ensemble of soloist superstars, gathered for the task, more with an eye on the box office than an ear to the music. The standard alternative is to take an established String Quartet and add an accomplished solo cellist. The choice and performance of the second cellist is an existential one for the performance, since this part drives and leads much of what will come to define the performance.

I have alluded to the symphonic nature of the piece. Indeed, on closer inspection it can come across as a chamber transcription of a bigger piece. Go play Wagner’s “Siegfried Idyll” and imagine what his orchestration of D956 could have sounded like. On the other hand, we are talking about one of music’s great masterpieces here, and as Fats Waller said, “If you don’t know what it is, don’t mess with it”. Written merely four years after Beethoven’s iconic ninth symphony, D956 looks more forward to Mahler more than it does back to Beethoven. It is more profound and introspective, less overtly melodic than Beethoven – you won’t be humming its tunes on your way home from the office – and its developmental structure is more complex and elaborate. D956 is all about soundscapes, textures, and moods, right the way through to the bizarre final chord, which comes across like a bum left hand note played by an over-excited pianist who leaps too high on his final flourish and lands in the wrong place (I confess, I don’t know what Schubert had in mind there).

I have yet to come across a “definitive” recording of D956. I have four, by the Emerson, Takács, Tokyo, and Vellinger string quartets, each with a guest cellist. Each has something to be said for it. The Emerson is notable for its great tonal beauty, the Takács for its liquid playing, the Tokyo is the most classically refined, and the Vellinger offers an ascetic, soul-baring honesty. As a purely personal opinion, I tend to gravitate to the Vellinger, which is hard to come by because it was a free giveaway with the BBC Music Magazine about 20 years ago, so it is unfair to recommend, but to me it best captures the soul of the piece. But all four paint dramatically different pictures, with the contrast between the Emerson (imagine Iván Fischer conducting) and the Vellinger (imagine Pierre Boulez conducting) occupying the extremes. Continuing with that analogy, the Tokyo could be Arturo Toscanini, and the Takács perhaps even Carlos Kleiber. They’re all very, very good, and the differences are primarily of style rather than musicianship.

It may be Chamber Music, but it is magnificent.

Check out Light Harmonic’s new crowd-sourcing campaign – the “Geek Pulse”! Yes, the same Light Harmonic, maker of the mega-buck Da Vinci DAC, are now developing a product at the other end of the price spectrum, bringing ultra high-resolution PCM, together with the very latest in DSD playback support, to the market at a VERY affordable price. I can’t wait to get my hands on one!

It has taken me longer than usual to finally pronounce on iTunes 11.1.2, but here I am. I wanted to take a little longer, because a very small number of users have posted on our FaceBook page, and also through the e-mail support line, that they have encountered unexpected problems after installing the combination of OS/X Mavericks and iTunes 11.1.2.

Here at BitPerfect I have been running that combination for two days solid and have not had a single problem. Furthermore, one or two of those users who did encounter problems have reported that these problems have suddenly vanished.

On balance, therefore, I don’t really see any good reason why you should not all make the update if you want to. I suspect, by the way, although I am not certain about this, that if you upgrade to OS/X Mavericks, you might get iTunes 11.1.2 as part of the package, whether you want it or not.

Now that OS/X Mavericks has been released, we can finally announce something we have known since the summer, but have been forbidden from disclosing. After a two year absence under Lion and Mountain Lion, Integer Mode is back again with OS/X Mavericks, and BitPerfect 1.0.8 already includes the software necessary to support it.

We have been using Integer Mode under BitPerfect 1.0.8 (and also using our own pre-release betas) since early summer, ever since we received the first pre-release betas of Mavericks. Both Mavericks itself, and its Integer Mode functionality under BitPerfect have been totally problem-free.

We are therefore confident that BitPerfect Users can upgrade to Mavericks.

Be aware that not all DACs support Integer Mode.  And I don’t know of anybody out there who is maintaining an up-to-date list of Integer Mode compatible DACs, so please do not ask me for advice on that subject 🙂

#### Be aware that not all DACs support Integer Mode.  And I don’t know of anybody out there who is maintaining an up-to-date list of Integer Mode compatible DACs, so please do not ask me for advice on that subject 🙂

A busy morning here at BitPerfect Global HQ.  OS/X Mavericks has been released, together with an update to iTunes, version 11.1.2.  I am currently using both, and it seems to be working just fine, but I am also getting reports from BitPerfect users who are encountering problems.

I have been using Mavericks in its pre-release forms for some time now, and have never encountered a problem with it, so I really don’t see any reason why BitPerfect users should not be able to upgrade with confidence.

On the other hand, iTunes updates have always been a cause for concern, so I recommend that BitPerfect users hold back from updating iTunes for the time being.  I will post an update here in due course.

A busy morning here at BitPerfect Global HQ.  OS/X Mavericks has been released, together with an update to iTunes, version 11.1.2.  I am currently using both, and it seems to be working just fine, but I am also getting reports from BitPerfect users who are encountering problems.

I have been using Mavericks in its pre-release forms for some time now, and have never encountered a problem with it, so I really don’t see any reason why BitPerfect users should not be able to upgrade with confidence.

On the other hand, iTunes updates have always been a cause for concern, so I recommend that BitPerfect users hold back from updating iTunes for the time being.  I will post an update here in due course.

Last night, on TV, my wife and I watched an episode of the current season of “The Amazing Race”.  In our house we have a modest, but surprising effective home theater system permanently connected to our TV set.  It is in play regardless of what we are watching on TV.  Our TV signal is derived from satellite, time-shfted on our PVR, and the show was on a HD channel with the sound encoded in Dolby Digital.  The sound delivered by this system is normally very clear, but in this case it was an absolute cacophany, and I can only describe it as a shameless assault on my ears.

The Amazing Race sees itself as a non-stop action show, punctuated by the occasional pause for an interlude of weepy all-American sentimentalism.  It plays against a continuous background of “Action Movie!!” orchestral blasts, noisy, percussive, syncopated.  No melody at all, and no let-up in its ongoing intensity.  It is mixed with the maximum possible amount of compression, and presented at the maximum volume, so that it is continuously, relentlessly loud.  It accompanies the action non-stop.

The show also provides a commentary, delivered by a shouting host, interspersed with snippets of interjections from the various participants.  The commentary tracks is separately mixed, and is also mastered with the maximum amount of compression, at the maximum volume.

Since the music track and the commentary track are each fully capable of drowning out the other, the producers have determined that the commentary track must take precedence.  So the loudness of the commentary track is used to modulate the loudness of the music track.  When somebody shouts, the music is briefly backed off a little, and immediately ramped back up after, even if they are just pausing for breath.

The net effect is a relentless assault on the eardrums.  It makes it very hard to follow the dialog without getting a headache, and in fact makes watching the show a less than pleasant experience.  Your brain is not equipped to deal with such heavily compressed and modulated sounds, and goes into overload.  I found myself wondering if the CIA would have gotten into as much hot water as they did if they had used “The Amazing Race” instead of waterboarding.

This was way worse than even the last season (or was it the last-but-one season) of “House”, where there was no music track, but in its place the ambient background noises of the set were amplified to the point where it was dominated by hiss.  This hissy noise was then massively modulated by the dialog.  Again, a ruinous detraction from the enjoyment of the show.

These people need to get into another line of work.  Now there’s a thought…  Maybe these ARE the same people who got fired from the CIA for waterboarding!