Recently I have been conducting some rudimentary tests, not really having a firm idea where I was going with them, but sometimes good things pop out of the woodwork while you’re actually looking for something else. Unfortunately, nothing useful popped out of the woodwork on this occasion, but if nothing else, I thought I’d get a post out of it… 🙂
As those of you who read these posts regularly will know, we are on the brink of releasing a brand new product whose core competence is to produce high quality PCM versions of DSD files. The product is done – it’s just the nuts and bolts of the product launch that we are fiddling with. In developing this product, part of its optimization has been achieved purely on a mathematical basis, relying on detailed measurements and models. But, as always, the final product is fine tuned based upon what we hear, regardless of how it measures.
One thing that we consistently hear is that DSD versions of a track usually sound better than their PCM counterparts. Occasionally, there is not much to choose between them, but we rarely seem to get PCM that sounds better than its DSD counterparts, and I have spent a long time poring over reasons why that might be. I have some arm-waving ideas, but they are not yet well-developed enough for prime time, and will need major development at our end if they are ever going to get there. In the meantime, I confine myself to looking at the things I am able to look at and, and see if something interesting just happens to jump out. Not much has, as yet.
Surprisingly, one of the things I have not done yet is the null-transform test. This is where you take a waveform A and transform it into another waveform B. Then, you transform B back into a copy of A, which I will call A*. It is easy to show whether A and A* are identical. You simply invert one of them and add the two together. If the two waveforms are identical, then the result will be absolute digital silence (a “null transform”). If the two are not identical, then the resultant waveform will comprise the differences between the two. Examining, or even just listening to these waveforms, can often tell you a lot about the nature of the differences. If A is a WAV file and B is a FLAC file, then the result should be a true null transform, where A and A* are identical. But if B is an MP3 file, then the result will most certainly not be null. I set about initiating a transform where B is a DSD file, just to see what gives.
I decided to start off with a high-quality 24-bit 44.1kHz PCM file. I then up-sampled it to 24-bit 176.4kHz PCM using the SoX Linear SRC engine. I did that to ensure that there is no part of the signal within the frequency range where DSD’s shaped noise floor begins to rise sharply. That became my reference A file. I ran it through our ultra-high resolution (-300dB noise floor) FFT analyzer to make sure it contained no rogue frequency peaks, and sure enough it did not. I next used Korg Audiogate to create a DSD128 (5.6MHz) DSD B version. Finally I used DSD Master to create a 24-bit 176.4kHz PCM A* copy.
Listening to all three copies, A, B, and A*, I was struck by how similar to one another they all sounded. Frankly, I did wonder whether I would be able to tell them apart in a blind test, although some differences did emerge. I felt A had a touch more ‘sparkle’ to it. But the interesting part would be to do the null test by inverting A* and adding it to A. This presented some tricky problems. First, if there is any net gain (or loss) in the transformation, then this will show up massively as a difference in the null test and we don’t want that. Unfortunately, although Korg’s Audiogate does have a “gain” setting, this does not (for various entirely legitimate reasons) necessarily translate to an absolute peak signal value in the DSD file. And since I cannot be sure what the signal reference level is in the DSD “B” version, I can’t set a correct gain setting in DSD Master. So I left that on “Normalize”, which produces an A* PCM file normalized to 0dB – in other words with the maximum possible resolution.
Loading the A and A* files into Audacity to do the null transform, it was a simple matter to invert A*. Then I needed to “normalize” the A file. Finally, I needed to time align the two files. For various reasons, the transforms had left the files non-aligned temporally. By looking for sharp peaks in the music waveform (fortunately I was able to find one quite easily), I was able to use Audacity’s drag tool to visually align the signals to the nearest sample. Finally, I nulled the two files together. Immediately, Audacity showed me that there was a very substantial residual. Looking at Audacity’s FFT of the null signal, I could see that its 20-20kHz band had essentially the same shape as the spectrum of the original A file, but was depressed by about 40-60dB. There was nothing in the spectrum of the null signal that seemed to be telling me anything obviously useful.
Playing the null signal using Audacity’s built-in player, it sounded just like a scratchy version of the original with the volume turned down, and with the bass largely missing. This impression was validated playing it on my reference system. These observations confirmed that the differences were not just in the ultrasonic noise spectrum added by the conversion stage to DSD, but were substantially within the audio frequency range, and furthermore were less evident in the deep bass than elsewhere in the audio spectrum. Far too early to say, but these were suggestive (to me at any rate) of an accumulation of phase errors.
I must confess I expected a better result from the perspective of the qualitative data. The listening tests showed that A and A* sounded very close to one another indeed, yet the null signal showed the presence of quite a substantial difference signal.
We are going to have to repeat this sometime using our own software. Audacity is great, but we didn’t write it, and I don’t even know if it is doing exactly what I am assuming it is at any point in time. Plus, I want to do a more accurate job of level matching and time alignment before nulling, a job which is best done by fine tuning the levels and timings to minimize the magnitude of the null signal. All quite processor-intensive. Also, I would like to use our own SDM to produce the DSD “B” files so that we are at least in control of the choice and characteristics of the filters, but since we don’t yet have our own SDM that aspect remains tricky.
All this to say that the net result of my null test was a null result. But I thought it was at least an interesting peg in the ground. If there is any interest, I may make the resultant files available for download next time (I couldn’t do that this time because I don’t have the rights to distribute the files I was using).
I have been engaged elsewhere over the last two weeks in an on-line exchange on a subject which ultimately boils down to “Why do CD transports sound different?”. For most people, digital audio is quite a straightforward thing. We read a bunch of digital data off a digital storage medium and play that data back through an audio converter. Provided the transport is able to read the data accurately, then there is no basis on which to anticipate that different transports can sound at all different.
As the 1960’s rolled into the 1970’s the prevailing wisdom regarding turntables was not all that different. Supposedly, all a turntable had to do was rotate at a constant speed and there was nothing more to it than that. Once wow, flutter, and rumble were reduced to levels determined to be undetectable, there was no basis on which to anticipate that different turntables could sound at all different. And then along came Ivor Tiefenbrun, who founded Linn and proceeded to turn the audio world totally upside down with his forcefully-delivered theory that the turntable – yes, the bit that just spins round – was in fact the single most important determinant of sound quality. A theory that he was able to reduce conclusively to practice. The wow, flutter, and rumble argument was consigned to the same dustbin of history into which the “bits is bits is bits” argument is currently tumbling, if it has not already landed.
Ivor’s “dry bones” theory of turntables was quite easy to understand. It depended simply on taking into account mechanical relationships which had hitherto been assumed to be non-contributory. Reading an LP with a stylus is an entirely mechanical process. The motion of the stylus, being in contact with the LP’s surface, is transformed into an electrical signal by the motor assembly built into the cartridge body. However, the LP sits on the turntable’s platter. The platter sits on the bearing. The bearing is bolted to the chassis. The pickup arm’s base is also bolted to the chassis. The pickup arm itself is connected to the arm base via a bearing assembly. Finally, the cartridge body is bolted to the pickup arm. Therefore, the voltage generated by the cartridge reflects not only the relative motion of the stylus in the groove, but also any mechanical detritus that may exist in the “mechanical ground plane” represented by the arrangement of interconnecting elements. As the theory gained ground, so the “mechanical detritus” became better understood in terms of loose joints, vibration, energy storage and isolation. Ivor’s Linn Sondek was considered to be outrageously expensive for a turntable. That said, today’s ultra high end turntables – whatever you may think of them – can sell at prices that would make even Ivor wince.
Applying Ivor’s lateral thinking process to the modern (strange word, that, for something which is already going the way of the dodo) CD transport, we need to look more closely at the processes that we might otherwise assume to be non-contributory. Like the turntable of yore, a CD transport spins a disc and a transducer reads the music off its surface. In this case the music comprises pits impressed into the surface of a layer of metal beneath the protective plastic surface of the disc. To detect these pits, the CD player contains a tiny laser. The laser beam is focussed down onto the metallic surface of the disc, and reflects off it. The reflection is picked up by a photodiode which then outputs an electrical signal in response. The idea is that the pristine surface produces a nice clean reflection, whereas the pit produces a more diffuse reflection. The clean reflection results in more reflected light impinging on the photodiode, and the diffuse reflection off the pit results in less. The electronics behind the photodiode then try to determine whether the reflected signal represents a pit (a “1”) or a clean reflection (a “0”).
The whole process is not as clean and tidy as you might imagine. First of all, the actual signal output by the laser is noisy for a whole bunch of reasons. Secondly, the beam has to pass through the plastic protective layer on the CD’s surface, both before and after reflection, and that plastic layer can be scratched and dirty. Secondly, the beam position is controlled by a servo, which means that it is constantly drifting slightly in and out of alignment with the stream of pits. Then the photodiode itself is noisy. All this noise means that it can be extremely challenging for the electronic circuitry to reliably detect whether the signal represents a 1 or a 0. In fact, it gets it wrong alarmingly often. To deal with this inescapable problem, the CD standard requires the actual data to be encoded in a format known as eight-to-fourteen modulation. This, among other things, adds a whole bunch of extra bits to the data stream in such a way that if there is an error in reading an individual bit, it can be detected and automatically corrected. So even though the read-off error rate can be quite high, the actual data is nonetheless very accurately retrieved from the disc. Many people, therefore, will point out quite reasonably that unless the disc is badly scratched, marked, or otherwise damaged, it is fair to assume that the data stream extracted from a CD is essentially accurate.
Aside from getting the ones and zeros correct, a critical aspect of digital audio is timing jitter. The theory underlying digital audio makes the fundamental assumption that the digital samples are converted to analog at the exactly correct sample rate. Slight variations in the sample rate in a real-world system are referred to as jitter. It is therefore important that the data coming off the transport is also synchronized very precisely to the sample rate. However, since the rate at which data is retrieved from the disc is governed by the speed at which the disc spins, this means that the disc’s speed needs to be controlled with phenomenal accuracy. And this is further compounded by the fact that because the pits on the disc have exactly the same spacing from the centre of the disc to the outside, the actual spinning speed of the disc varies dramatically from the start of the disc (the centre) to the end (the outside). In today’s transports, the data is buffered to get around this. The data is read into a buffer at a higher speed than is needed for playback, and the actual output data of the player is then transmitted according to a separate, and highly accurate, clock. Most people, therefore, will point out quite validly that a modern transport’s jitter performance should be decoupled from the mechanism of the rotating platform.
These two issues between them appear to confirm that there is no reason to imagine that two transports should sound significantly different. Unfortunately, experience suggests that CD transports continue to sound different in practice. Without much of an evidentiary basis to enable blame for this to be laid at the foot of data errors (even though this claim continues to be made, mostly, it must be said, with no factual basis) the usual culprit is held to be jitter.
Jitter is a very helpful villain when it comes to needing something to blame. The notional jitter sensitivity of digital audio is stupefyingly tight. Yet it applies to the specific timing at which individual sample values are converted from digital to analog. This is a signal which is difficult to measure, since the master clock is not normally externally accessible. Instead, you can measure the jitter inherent in a serial data stream (such as S/PDIF) between transport and DAC, although it is not clear what the relationship would be between the jitter of the data stream and the jitter of the master clock. Also, you can look for measurable artifacts in the analog output of the DAC, which you can then relate to the jitter properties of the master clock, although the underlying theories which are used to derive these relationships are built on highly simplistic assumptions. In short, it is very handy to be able to blame something on jitter, because there is very little in the way of a basis upon which to dispute such an assertion.
The notion behind jitter is that a sample arrives early (or late) – maybe by a fraction of a nanosecond – and as a consequence the analog output changes amplitude a fraction of a nanosecond early (or late). The problem is that these circuits just don’t respond unambiguously over those timescales. One nanosecond is one thousandth of a microsecond (which in turn is a millionth of a second). The waveform that is the output of the DAC core needs to change over a period from the end of one sample to the beginning of the next. This is a timescale of the order of microseconds. If you want to determine the precise timing of that change to an accuracy of less than one nanosecond, it means you have to measure it with a bandwidth exceeding 1GHz. This is way up into the RF end of the frequency spectrum. Look at ANY such signal with a bandwidth of 1GHz, and zoom in to a nanosecond-scale resolution, and ALL you will see is noise. This is because RF is all-pervasive. If it wasn’t, none of our radios, TVs, cell phones, WiFi, GPS, or bluetooth devices would work. Stopping RF from infiltrating – and propagating within – electronic circuits is a major, major challenge. Particularly if those circuits have to deal with signals within the RF bandwidth as matter of design.
In practice, what happens is that the actual waveform over timescales corresponding to sample rates is arrived at by bandwidth limitation. Bandwidth limitation is in effect a big averaging filter – the peaks and troughs of the noise cancel each other out and you are left with the underlying signal. Perhaps the underlying signal does average out to be a tad early or a tad late. But the other thing about noise is that the underlying signal can also average out to be a tad too high or a tad too low (if you know enough about the noise – and the problem is we mostly don’t – you could even predict how often, and by how much, this will happen). I am not sure that it is even possible in any practical sense to separate those two phenomena. In any case, the solution lies in managing the RF noise problem. It can’t be avoided, because the inside of a CD transport is an inherently RF-rich environment. Many DAC manufacturers are already addressing this with various degrees of sophistication. I suspect they still have a lot further to go.
Going back to the turntable again, the problem is to read the analog undulations in the surface of a plastic disc and represent them as an analog voltage. The solution was to eliminate every possible interference from the unavoidable mechanical elements of the design. It is the same problem in the digital world. We have to read the digital undulations in the surface of a plastic disc and represent them as a digital voltage. Except this time it is not mechanical interference, but RF electrical interference we have to worry about.
When we say something is digital, it is not really sufficient to say that it deals with ones and zeros. It deals with situations where we have no need to take into account the possibility of something taking on a value other than one or zero – or, more accurately, taking on value that can always be expressed in their totality using arrangements of multiple ones and zeros. Digital signals behave in a logical fashion, and represent a logical, ordered, bounded state of affairs. Once those constraints fail to apply, then we are no longer looking at a digital signal. ALL of our digital signals are naturally contaminated with RF Noise. Understanding the behaviour of a digital data stream contaminated with RF noise requires treating it as an analog waveform.
Our challenge is to seriously reduce the RF noise from our digital environments, as Ivor Tiefenbrun did with mechanical noise in his turntable designs. In doing so, we must bear in mind that we can never eliminate it. Just as even the very best turntables of today, while sounding indisputably better than their forebears of yesteryear, still do manage to sound subtly different, so digital transports – in fact digital sources of every stripe – will always continue to sound slightly different, even if by increasingly smaller degrees.
I suppose I have never really thought about this before. If you are in the habit of attending live orchestral concert performances, the chances are that you will never hear the same orchestra, with the same conductor playing in two different concert halls. When you go to see your local orchestra it is always in your local concert hall. Sometimes you attend your local concert hall to see a guest orchestra. Or even your local orchestra with a guest conductor. When traveling, you might visit another concert hall and see another orchestra. It is pretty rare that you will go on a road trip and take in a recital by your home orchestra.
Here in Montreal, the Orchestre Symphonique de Montréal is now in its second season in its new, purpose-built digs, the Maison Symphonique de Montréal, located right next door to its home for the previous 40-odd years, the Salle Wilfrid Pelletier. We were there for the first time last night for a performance which included Ravel’s orchestral arrangement of Tombeau de Couperin, the Sibelius Violin Concerto, and Stravinsky’s Petrouchka. Unusually for an orchestral concert, the ensemble also provided an impromptu encore of Debussy’s Prelude de l’Apres-Midi d’une Faune.
More than anything else, the occasion prompted me to consider the effect of the venue on the sound of an orchestral performance. The fact is, I was quite stunned at the magnitude of the difference between what I heard last night and what I have been used to hearing over the years in the Salle Wilfrid Pelletier. Perhaps not surprisingly, the difference was akin to listening to two radically different loudspeakers. Listening to music in the new Maison Symphonique de Montréal, it seemed to me, was like listening to music on our Stax SR009 electrostatic headphones. The absolute tonal neutrality, the extraordinary level of detail, and even the apparently constrained sense of dynamics are exactly what I am used to hearing when I don the electrostatics. By comparison, the old Salle Wilfrid Pelletier has the more enveloping, warm, dynamic, smile-inducing sonic character of our B&W 802 Diamonds. It is said that Dave Wilson voices his loudspeakers according to his experiences in Vienna’s Concert-Verein, a venue he reportedly attends on a regular basis. For sure, Wilson speakers all conform to a recognizable “family” sound.
The new Maison Symphonique de Montréal does indeed have quite remarkable acoustics. And it is new enough that it may be a few years yet before they finish fine-tuning it. But the astonishing level of detail is both a boon and a problem. When maestro Nagano’s foot slides as he gestures from one side of the podium to the other, I hear it. From the middle seat of the back row. Every chair scrape. Even, once, I heard the rustle of paper as the violin soloist turned the page on his score during a quiet interlude. Then there are other intrusive sounds that I can’t quite identify. Can that possibly be a cello playing waaaay out of tune? Surely not. Can I hear people talking in the room next door? Again, surely not. But every last cough, unwrapping of a cough drop, clearing of the throat, all these things come across more clearly – and, I must observe, more obtrusively – than anything I have become accustomed to over the years. I found myself wanting to continuously turn up the volume, just as I do when listening on the electrostatics. Until we got to a loud bit, that is.
And so I found it rather odd. We audiophile types tell ourselves all the time that the ultimate objective is the ability to accurately reproduce the sound of a live acoustic event. And, in truth, it is a laudable objective, and one that is hard to argue cogently against. And yet there I was, suddenly appreciative of both the extent to which live acoustic events differ in their own sonic presentation, and the nature of those differences – even when the performers, and, one might hope, the performances, are the same – in almost exactly the same way that loudspeakers, argued by many to be the most critical element of an audio system, differ in their sonic presentation.
If the sound of music heard in the Maison Symphonique de Montréal can sound so dramatically different from the sound of the same music heard in the Salle Wilfrid Pelletier, can one of those sounds be “right”? And can the other be “wrong”? Or is the Vienna Concert-Verein the only “right” one?….