Recently I have been conducting some rudimentary tests, not really having a firm idea where I was going with them, but sometimes good things pop out of the woodwork while you’re actually looking for something else. Unfortunately, nothing useful popped out of the woodwork on this occasion, but if nothing else, I thought I’d get a post out of it… 🙂
As those of you who read these posts regularly will know, we are on the brink of releasing a brand new product whose core competence is to produce high quality PCM versions of DSD files. The product is done – it’s just the nuts and bolts of the product launch that we are fiddling with. In developing this product, part of its optimization has been achieved purely on a mathematical basis, relying on detailed measurements and models. But, as always, the final product is fine tuned based upon what we hear, regardless of how it measures.
One thing that we consistently hear is that DSD versions of a track usually sound better than their PCM counterparts. Occasionally, there is not much to choose between them, but we rarely seem to get PCM that sounds better than its DSD counterparts, and I have spent a long time poring over reasons why that might be. I have some arm-waving ideas, but they are not yet well-developed enough for prime time, and will need major development at our end if they are ever going to get there. In the meantime, I confine myself to looking at the things I am able to look at and, and see if something interesting just happens to jump out. Not much has, as yet.
Surprisingly, one of the things I have not done yet is the null-transform test. This is where you take a waveform A and transform it into another waveform B. Then, you transform B back into a copy of A, which I will call A*. It is easy to show whether A and A* are identical. You simply invert one of them and add the two together. If the two waveforms are identical, then the result will be absolute digital silence (a “null transform”). If the two are not identical, then the resultant waveform will comprise the differences between the two. Examining, or even just listening to these waveforms, can often tell you a lot about the nature of the differences. If A is a WAV file and B is a FLAC file, then the result should be a true null transform, where A and A* are identical. But if B is an MP3 file, then the result will most certainly not be null. I set about initiating a transform where B is a DSD file, just to see what gives.
I decided to start off with a high-quality 24-bit 44.1kHz PCM file. I then up-sampled it to 24-bit 176.4kHz PCM using the SoX Linear SRC engine. I did that to ensure that there is no part of the signal within the frequency range where DSD’s shaped noise floor begins to rise sharply. That became my reference A file. I ran it through our ultra-high resolution (-300dB noise floor) FFT analyzer to make sure it contained no rogue frequency peaks, and sure enough it did not. I next used Korg Audiogate to create a DSD128 (5.6MHz) DSD B version. Finally I used DSD Master to create a 24-bit 176.4kHz PCM A* copy.
Listening to all three copies, A, B, and A*, I was struck by how similar to one another they all sounded. Frankly, I did wonder whether I would be able to tell them apart in a blind test, although some differences did emerge. I felt A had a touch more ‘sparkle’ to it. But the interesting part would be to do the null test by inverting A* and adding it to A. This presented some tricky problems. First, if there is any net gain (or loss) in the transformation, then this will show up massively as a difference in the null test and we don’t want that. Unfortunately, although Korg’s Audiogate does have a “gain” setting, this does not (for various entirely legitimate reasons) necessarily translate to an absolute peak signal value in the DSD file. And since I cannot be sure what the signal reference level is in the DSD “B” version, I can’t set a correct gain setting in DSD Master. So I left that on “Normalize”, which produces an A* PCM file normalized to 0dB – in other words with the maximum possible resolution.
Loading the A and A* files into Audacity to do the null transform, it was a simple matter to invert A*. Then I needed to “normalize” the A file. Finally, I needed to time align the two files. For various reasons, the transforms had left the files non-aligned temporally. By looking for sharp peaks in the music waveform (fortunately I was able to find one quite easily), I was able to use Audacity’s drag tool to visually align the signals to the nearest sample. Finally, I nulled the two files together. Immediately, Audacity showed me that there was a very substantial residual. Looking at Audacity’s FFT of the null signal, I could see that its 20-20kHz band had essentially the same shape as the spectrum of the original A file, but was depressed by about 40-60dB. There was nothing in the spectrum of the null signal that seemed to be telling me anything obviously useful.
Playing the null signal using Audacity’s built-in player, it sounded just like a scratchy version of the original with the volume turned down, and with the bass largely missing. This impression was validated playing it on my reference system. These observations confirmed that the differences were not just in the ultrasonic noise spectrum added by the conversion stage to DSD, but were substantially within the audio frequency range, and furthermore were less evident in the deep bass than elsewhere in the audio spectrum. Far too early to say, but these were suggestive (to me at any rate) of an accumulation of phase errors.
I must confess I expected a better result from the perspective of the qualitative data. The listening tests showed that A and A* sounded very close to one another indeed, yet the null signal showed the presence of quite a substantial difference signal.
We are going to have to repeat this sometime using our own software. Audacity is great, but we didn’t write it, and I don’t even know if it is doing exactly what I am assuming it is at any point in time. Plus, I want to do a more accurate job of level matching and time alignment before nulling, a job which is best done by fine tuning the levels and timings to minimize the magnitude of the null signal. All quite processor-intensive. Also, I would like to use our own SDM to produce the DSD “B” files so that we are at least in control of the choice and characteristics of the filters, but since we don’t yet have our own SDM that aspect remains tricky.
All this to say that the net result of my null test was a null result. But I thought it was at least an interesting peg in the ground. If there is any interest, I may make the resultant files available for download next time (I couldn’t do that this time because I don’t have the rights to distribute the files I was using).