Here is a beautifully written piece about a present-day composing sensation from Japan. Deaf, like Beethoven, and a second-generation survivor of Hiroshima, and something of a cultish figure in his own right, Mamoru Samuragochi was a highly popular composer of music for video games who became a recent sensation with his 1st Symphony, a work hailed as true genius. Finally, Japan had produced a classical composer to be revered among the company of Mahler, Bruckner, Beethoven and the like.

Except that it wasn’t actually like that at all. Samuragochi – a master of marketing and self-promotion – actually paid a musical prodigy with a serious problem of low self-esteem to write his music for him, and to stand back while Samuragochi took all the credit. But more than that, Samuragochi’s deafness was another artifact – conjured up to enable him to avoid having to answer awkward questions in press conferences.

In today’s world, lies of this magnitude, coupled with both success and a huge public profile, cannot be kept under wraps for very long. Even so, many of the corporate and institutional organizations who had hitched their wagons to the Samuragochi juggernaut, decided that turning a matching deaf ear of their own to emerging shouts of protest would be their preferred course of action.

That this whole drama played out in Japan, a country whose culture is so unique, and so different to what we in the west like to think of as “Normal”, adds a unique spice to the whole story.

Please read Christopher Beam’s well-written piece in “New Republic”, which provides a combination of wonderfully intriguing characters, Shakespearean tragedy, cultural back story, and even the hints of a twisted coda yet to be played out. Frankly, I see this as magnificent, classical, absolutely first-rate operatic material. Thomas Adès, are you reading?….

http://www.newrepublic.com/article/121185/japans-deaf-composer-wasnt-what-he-seemed

There was a post the other day on the Audio Review web site by Brent Butterworth.  In it, he laments how on one hand audiophiles are falling over themselves swooning over DSD, while at the same time Class-D amplifiers receive short shrift.  After all, he tells us, DSD and Class-D are the same thing.  Reading between the lines, this is all some kind of giant marketing faux-pas.

Well, sorry, but DSD and Class-D are most definitely not the same thing.  In his post he suggests “Put bigger transistors, a bigger power supply, and larger filter components on the end of a DSD DAC …. and what have you got?  A Class-D Amp.”.  Well, if life really were as simple as that, Class-D amplifiers might indeed sound great – at least better than they do for now.  But it is not at all the same thing.  Let me explain why.

I have already explained elsewhere how DSD DACs function, and what they have to do to sound as good as they do.  But, in order that this post can stand on its own two feet I am going to go over the key points once more.

DSD is a 1-bit binary bitstream running at a frequency of 2.8224MHz.  In order to convert it to analog, all we have to do is pass the bitstream itself directly through an analog low-pass filter, and the result is music.  It really is as simple as that.  Except, that is, if you want the ultimate in sound quality.  In that case, what you find is that the filters required for that task are not as benign as you might like.  They tend to be similar to the anti-aliasing filters required to be inserted into the signal path prior to 16/44.1 PCM encoding, and tend to endow the sound quality with many of the same characteristics.  To get around that, virtually all DSD DACs up-convert the incoming DSD to a ‘variant format’ of DSD with a higher sample rate, and quite often with an increased bit depth of 3-5 bits.  It is that ‘variant format’ of DSD which is passed into the low-pass filter during analog reconstruction.

With the ‘variant format’ of DSD, we can specify a filter whose characteristics are not so aggressive, and which has, as a result, better sound quality.  It was always so, even with the first SACD players introduced to the market some 15 years ago.

But what does it mean in practice, to pass a digital bitstream through an analog filter?  This is easiest to describe when we confine ourselves to a pure 1-bit bitstream.  The waveform is a pseudo-square wave, which is either at its maximum when the bitstream reads ‘1’, or at its minimum when the bitstream reads ‘0’.  In a DAC chip, those maxima are of the order of +1 Volt and the minima of -1 Volt.  So in a DSD DAC we would be generating a pseudo-square wave whose voltage varies rapidly between plus and minus one Volt.

The differences between a DAC and a Power Amplifier are twofold.  A DAC is a low-power device whose output is of the order of a few hundred milliwatts, with a maximum swing voltage of the order of a volt.  By contrast, a Power Amplifier is a high power device, whose output is of the order of hundreds of Watts, with a maximum voltage swing of many tens of volts.  This places some significant demands on the circuit whose job it is to feed the appropriate pseudo-square wave into the analog reconstruction filter.  Instead of switching the voltage between plus and minus one volt it now has to switch between plus and minus a hundred volts (give or take).  And instead of those voltages carrying a few hundred milliamps, they must now carry a few Amperes of current.

What happens when you start switching a 100-volt line carrying several amps of current on and off at a frequency of several MHz?  The answer is that you generate massive quantities of radio-frequency energy.  In fact, the chances are good that you will jam everybody’s radio for a distance of several blocks.  That can get you into a lot of trouble.  But even if you could fix that particular problem – which you can, at some cost – you still require some pretty sophisticated (read expensive) components to switch that kind of signal at that kind of frequency.  Frequency is the big problem here.  The higher the frequency, the worse the problem gets.  So to answer Brent Butterworth’s question “Put bigger transistors, a bigger power supply, and larger filter components on the end of a DSD DAC …. and what have you got?”  The answer is: “A radio station.”.

So how does a Class-D amplifier work, then?  The answer is that they grab the bull by the horns and instead of moving the frequency UP, they move it DOWN.  Using a sigma-delta modulator a Class-D amplifier remodulates the incoming signal to a lower sample rate than that of DSD, but to preserve the integrity of the signal it uses a bit depth of more than 1-bit.  I don’t want to get too technical at this point, but how it does that takes it into a different realm.  It increases the effective bit depth not by using it to encode the intensity of the pulse in the waveform, but instead the width of the pulse.  In effect it encodes the output of the sigma-data modulator not in with “Pulse Density Modulation” representation (which is what DSD is) but with a “Pulse Width Modulation” (PWM) representation.

By using this PWM approach at a much lower frequency, usually in the high hundreds of kHz, it means that the switching can be accomplished using affordable components, and we have less of a problem with RF emissions.  But we still have the analog filtration to do, and this remains an area of concern for ultimate sound quality.  The other issue is whether or not the PWM switching can maintain the linearity and distortion performance necessary for high-end audio applications, while still delivering the amount of current that loudspeakers typically consume.  It is this area which is ripest for exploiting with intelligent and innovative engineering solutions.

Today, Class-D amplifier technology is making serious in-roads.  For non-audiophile applications they are beginning to rule the roost.  Even in the audiophile sphere, a number of quality Class-D amplifiers are now on the market, and while there are notable exceptions which do deliver seriously impressive sound quality (I’m looking at you, Devialet!), they remain largely confined to the low-to-mid range price/performance tier.  But note that even for serious audiophile applications, Class-D amplifiers have totally revolutionized powered subwoofer technology.  All things considered, at the present rate of progress, I can see Class-D dominating even the high-end market before too long.  But not for the reasons Butterworth suggests.

All that said, we’re not there yet.  And don’t forget that, even today, the market is still very much alive with vacuum tube power amplifiers.

It is surprising how little-understood patents are.  Ever since 1989 I have managed to find myself responsible, at one time or another, for the patent portfolios of each of the companies I have worked for.  I am also an inventor on several issued patents.  So I know a little bit about the subject – at least enough to know that most people hold onto a number of misconceptions.  I thought it would be instructive to post an overview of the key issues pertaining to patents.

First of all, it is important to understand what can be patented and what can’t.  A patent must describe either a specific thing, or a specific method of making a thing.  By specific, I mean that the inventor must clearly describe exactly what constitutes that thing, and provide clear criteria that permit the reader to distinguish the invention from something which is not covered by the patent, leaving as little as possible in the way of a “grey area”.  Generally, there must be an ‘inventive step’ – a critical point at which the invention departs from what was previously known (what we call ‘prior art’).  Also, a patent must contain ‘full disclosure’.  In other words, it must contain everything that a person skilled in the art would need to know to be able to successfully replicate the invention.  The inventor must not withhold some key “secret sauce” from the patent disclosure.  Weaknesses in these areas can result in a patent having its applicability restricted, or even being invalidated, at some point in the future.

Patents can only cover something which has been invented rather than created, so, for example, you can patent a clever new way of making words appear on paper, but you cannot patent the words themselves – those you might consider copyrighting.  You cannot patent your logo or business name – those you might consider trademarking.  You cannot patent your customer list – that would be a trade secret.  And you cannot patent all the brilliant things that could be done “if only someone would invent such-and-such a thing”.  Those you could write a science-fiction novella about.

Finally, the patent should disclose who invented the invention.  There can be multiple inventors, but if so, each listed inventor must be able to point to a critical aspect of the inventive step for which they are responsible, and all of the actual inventors must be included in the patent.  Just being the owner of the company for whom the inventor worked does not entitle you to be listed as an inventor [It is said that Elena Ceau?escu, wife of the communist dictator of Romania, had her name included as an inventor on all Chemistry patents issued in Romania].  It is not unusual for all rights in the patent to be assigned to a third party, usually the employer of the inventor(s), although the inventors’ explicit consent is required for this to happen.

The structure of a patent comprises two parts, the Specification and the Claims.  The Claims are the most important section of the patent.  Only what is claimed in the Claims is protected by patent law, and this is a very important distinction.  The claims describe, in a manner set forth both by statute and by common practice, exactly what has been invented.  Someone reading them might well be unclear on exactly what those dry and clipped claims actually describe.  Taken on their own, there may be some ambiguity about what the specific language in the claims refers to.  Therefore, patents also include a Specification section in which the invention is described in detail, in the context of the pre-existing state-of-the-art, and some examples of specific embodiments of the invention are provided.  That way the Claims are designed to be read and interpreted in the light of the Specifications.

It is important that the claims of a patent describe only the new inventions for which the inventor is seeking protection.  If a claim describes something which pre-existed before the patent application was submitted, a circumstance known as ‘prior art’, then that claim – and in some circumstances the entire patent – can be held to be invalid.  This will be the case regardless of whether or not the inventor was aware of it.  Often there may be some ambiguity regarding whether a key aspect of a claim is or is not anticipated by a certain item of prior art, and this may be known and understood by the inventor.  In that case, the Specification will typically include material identifying such prior art and explaining why and how the inventor’s claims are distinct and different.

Once you have written your patent, it must be submitted to the patent office for approval.  The patent office will assign it to a patent examiner who will make a cursory, but intelligent, examination of your patent and will attempt to establish whether or not the submitted document meets all of the requirements to be granted as a patent.  He may question whether the disclosure is complete.  Or he may raise specific objections based on existing patents or other publications which he considers may describe prior art.  You will then have the opportunity to address those objections and either re-submit the patent or provide clarifying information to the examiner.  This can go back and forth many times, or not at all.  The details of all such back-and-forth dialog will stay in the patent’s history file, and may be referred to in future if the patent is ever challenged in court.  In any case, the end result of the process is that in due course the patent is usually issued.  A patent only comes into force after it has been issued.

Once issued, the patent has a severely restricted lifetime.  In the US this is 20 years from the date when the patent was first filed with the patent office, regardless of how long it may have spent going back and forth with the examiner’s office prior to issuance.  Once the patent expires, it no longer conveys any protection whatsoever.  There are no ways to get around that.

It is one of the big mistakes that people make in regard to patents that they over-estimate the value of an issued patent.  All issuance demonstrates is that the examiner has been persuaded that the inventor has met the requirements for a valid patent.  It does NOT guarantee that the patent actually does meet those requirements!  Which can come as a big surprise to someone who has shelled out a lot of money to get to that point.

So what use is a patent then?  In reality, if you are the owner of a valid, issued patent, it gives you a legal basis on which to approach a third party who may be infringing that patent against your wishes and ask them to either stop doing so or purchase a license.  Generally, what happens next depends on whether the third party is bigger than you, and has greater financial clout.  If the party continues to ignore your entreaties, you will have the right to sue them for patent infringement.  Always bear in mind that knowing someone is infringing your patent rights is one thing – but it may be rather more difficult to prove it in a court of law.

A court of law is the only place where the ultimate blessing of validity can be bestowed upon a patent.  This is where you end up if you sue somebody – or if somebody sues you – for patent infringement.  A court of law can do what the patent examiner does not.  It can examine the patent in minute detail and pronounce with finality on whether the patent is or is not valid.  It can choose to limit the patent’s validity or invalidate it entirely.  In rare cases it can order the patent to be re-submitted with additional material in order to expand its validity.  A patent whose claims have been upheld in court can no longer be challenged.  The owner of a patent whose claims have been upheld (or even declared invalid, for that matter) in a court of law will also be up to $10M lighter in the wallet.  Yes, Doris, that was an ‘M’.  A patent infringement lawsuit is not for the faint of heart.

In the US, the doctrine of triple damages applies.  This means that if you infringe on the patent rights of a third party, in full awareness of those rights, then you will be liable for not only the damages you are held to have caused, but triple the damages.  The fear of incurring triple damages ensures that even large and powerful entities will take a patent infringement lawsuit seriously, because triple damages presents even a penurious client with an opportunity to seek serious legal representation on a contingency basis.

We’re still not done yet.  You have yet to decide where, geographically, you want your patent to have force.  If you have a US patent for example, your competitors in Germany, Japan, China, etc., can freely and legally enjoy full use of your patented inventions.  Your only remedy may be to stop that entity from importing infringing products into the US.  If you want the protection of your patent to extend to other countries of the world, then you have to file for patent protection in those countries too.  But be aware that your patent rights, the degree of protection offered, and the remedies available against infringement, may be different in each country.  Filing internationally gets to be very expensive, since your patent will usually require to be translated into each country’s native language, and rendered fully in compliance with each country’s patent codes.  Also, you cannot sit on your hands and see how things work out before deciding whether to file internationally.  You have to make that decision up front (in practice, there is a mechanism that can give you up to 12 months of breathing room for some countries, but that’s all, and its not much).

Finally, what does all this cost?  First of all, you will benefit from the services of a good patent attorney.  Yes, they charge up to $400 an hour, but there are good reasons for that.  I wouldn’t dream of filing a patent without the assistance of top quality counsel, and indeed I never have.  There are so many ifs and buts when it comes to costs, but I will give you two pegs in the ground that I think are fair.  To get to the point where you have a high-quality issued US patent will cost you $10k to $20k.  If your ambitions are international, a fully issued patent portfolio in a basket of countries in which a technology-oriented company might wish to do business will set you back $150k to $250k per patent.  That’s not chump change.  Remember, this is to arm yourself with a single issued patent, which may be willfully ignored by someone who doesn’t think you have the balls to sue them, or which may be shown to be invalid – whether in court, or in one of those “oh dear” moments when you open a letter containing a sheaf of technical documents that you wish had come to your attention before committing to all the expense of filing the patent in the first place!

At this point, a quick detour into good business practice.  Because of the doctrine of triple damages, companies and individuals should always make it strict policy that nobody (but NOBODY, other than in-house counsel) should ever read the Claims of any patent of which they are not the author.  If such a policy is carefully implemented in practice, then it follows that, legally, neither the person nor the entity can possibly be aware of any infringement of any patent.  Since only the Claims describe what is patented, even if you have read the Specifications section of a patent which you are accused of infringing, if you haven’t read the Claims you cannot know what has actually been patented.  This may sound devious – lets be honest here, it IS devious – but if you retain a blue-chip patent attorney this is the first lesson that he will hammer into you.  Practically it is not that hard to do, since claims make for very dry reading anyway.

Patents exist purely as a ‘barrier to entry’.  They are a barrier that obstructs your competitors from entering your line of business.  In that sense they are no different than the padlock you put on the factory’s front door when you go home for the weekend, or the insurance policies that you pay for once a year.  Like the padlock and the insurance, you need to understand what you are protecting yourself against, what the costs are of indulging yourself in that degree of protection, and what risks you run in not doing so.  There are exceptions to every rule, but for most small businesses – and I think all audio businesses are small businesses – patents are very rarely a justifiable form of protection.  But when telecom giant Nortel’s assets were sold off in 2011 following their bankruptcy, their patent portfolio was sold for $4.5B, in cash.  Yes, Doris, that’s a ‘B’.

This post is nothing so much as some extended thinking aloud on the subject of the audibility of phase.  I have written before about how phase relationships can profoundly affect the actual waveform of a complex sound even though the frequency content remains unaltered.  Experiments to determine whether those phase-induced changes are actually audible, using synthesized sounds, are unsatisfactory.  I personally am totally unable to hear any difference between the sounds of different tracks where all I do is vary the phase content.  But this doesn’t really prove much, because the human brain is not well adapted to discern subtle differences in synthetic non-real-world sounds.  Remember – the EARS listen, but the BRAIN hears.

A great, and very valid point of reference, are the ultra high-end loudspeakers of the Wilson Audio range (whose “entry-level” models cost more than my daughter paid for her two year-old Ford).  These top-end models include a facility for adjusting the positioning of the mid-range and tweeter units.  The idea, as claimed by Wilson, is to permit fine adjustment of the “time alignment” between the treble, mid, and bass drivers.  That such adjustments should have an audible effect is not surprising since, most obviously in the crossover regions, the signal reaching the listener is a combination of signals originally emitted by one or more drivers.  The “time alignment” of those signals can make the difference between those signals reinforcing one another, or trying to cancel one another out.  Those effects will manifest themselves in aspects of the speakers’ measured frequency response.  But beyond that, these adjustments have the effect of fine tuning the phase response of the speakers, at least to some degree.

What effect do these adjustments actually have?  I can tell you from personal experience that they are most effective.  And it is not a question of optimizing the tone colour for ‘naturalness’ as you might presume if the effect you were hearing were that of the phase reinforcement/cancellation effects alone.  No, what I heard, and what everyone else I have spoken to who has listened for themselves has reported, is that when the adjustments are ‘just right’ the whole soundstage seems to suddenly snap into focus in a way that only the Big Wilsons seems able to command.

This is personally interesting because a good 30 years ago I bought my first ever pair of seriously high-end loudspeakers, the Advanced Acoustic Designs ‘Solstice’ model produced by Colin Brett, a one-man operation whose day job was as owner of the local shaver repair shop.  Inspired by the now-legendary Dahlquist DQ-10, Colin designed a speaker with a sealed bass unit, above which he mounted an open-frame midrange unit and tweeter.  The open-frame units were progressively set back from the front of the bass unit’s baffle in order to provide a degree of time-alignment.  By the time I came on the scene he had completed that phase of the design by mounting a selection of differently cut frames and listening to how they each sounded.  I, on the other hand, wanted to hear this for myself, and suggested that he repeat the experiment this time using a pair of staggeringly precise piezo-electric slides, which I could conveniently borrow from where I worked.  Sadly, that experiment never came about.  I still have my pair of Solstice loudspeakers in my basement, although one of the mid-range units, long since out of production, has gone to meet its maker.

Just how much ‘time alignment’ do the Big Wilsons provide for, and how significant might you expect that to be?  The full range of adjustment is confined to something like a couple of millimetres (by my estimation).  That’s about one tenth of the wavelength of a 20kHz sound wave.  The process of homing in on the ‘right’ position involves setting it within what looks like less than 1/10 of a millimetre.  It seems a little surprising that mechanical adjustments of that order are necessary to fine tune the temporal response of a loudspeaker, but for the sake of this discussion lets take it at face value.  The adjustable Wilsons make me yearn for what Colin Brett might have heard if he had voiced the Solstice with a precision positioner instead of the much cruder and significantly less precise method he chose!  Although I wonder whether he would have been able to maintain such tolerances in manufacture, given the technology of loudspeaker cabinets in the early 1980s.

Phase and Time Alignment are different ways of looking at the same thing.  Phase is measured in fractions of the oscillation period of a wavelength, and Time Alignment in fractions of a second.  A fixed Phase error corresponds to a progressively smaller time error as the frequency gets higher and higher.  Alternatively, as the frequency gets higher, a fixed amount of time represents a progressively larger fraction of the period of the oscillation and therefore its Phase.  So ‘Time Alignment’ is more critical at higher frequencies than at lower ones, because it induces – or corrects for – a larger Phase error.

So to the extent that the Big Wilsons provide a crude “Phase Response” correction tool, and to the extent that the audible changes heard by the listener in response to those corrections represent the audibility of phase, we can look at various process that affect the phase response of an audio signal and compare those to the magnitude of the phase errors which are ‘audible’ on the Big Wilsons.  There are a lot of ‘ifs’ in there, but if you bear with me it might be instructive.

I like digital filters when it comes to this sort of discussion, because digital filters can – if designed properly – have a known and precisely constrained effect.  By constrained, I mean that all of their effects are knowable and are precisely quantifiable, even if, like the ‘phase response’ we may have trouble knowing what they all mean in terms of audibility.  By contrast, in an analog filter, both capacitors and inductors are in reality complex physical constructs whose behaviour we can only ever approximate, and can never precisely know.

I want to look at a simple low-pass filter and try to make some very general conclusions as regarding the audibility (or otherwise) of their phase responses.  I am going to choose a filtering operation I know quite well – a digital low-pass filter designed to convert a DSD source signal to PCM.  Filters similar to these are used in virtually all modern PCM ADSs.  Lets make some simplistic assumptions for the design of that filter.  We’ll specify the low-pass corner frequency to be 20kHz, the accepted upper limit of human audibility.  In order to eliminate any aliasing effects the filter needs to eliminate all signals above one half of the PCM sampling rate.  If the PCM bit depth is 24-bits, then we need to attenuate such frequencies by at least 144dB.  Finally, we want the character of the filter to have a Pass Band (the region below the corner frequency) with a frequency response as flat as possible.  There are some other parameters I won’t trouble you with.  Lets go away and design some filters and see how they look.

We’ll start by designing filters for 24/88.2, 24/176.4, and 24/352.8 PCM formats.  We’ll come back to 16/44.1 PCM later because, as we’ll see, it is a lot more complicated.  The first decision we need to make is regarding the type of filter we want to use.  There are two types of filter that we would ideally prefer to choose from, which both have a flat frequency response characteristic in the Pass Band.  Those are the Butterworth and Type-II Chebychev filters.  Butterworth has the advantage that the attenuation keeps falling the higher the frequency gets, whereas the Type-II Chebychev only provides a minimum guaranteed attenuation.

With each filter design we are going to look at two things.  First the ‘order’ of the filter.  This is something I am not going to get too deeply into, save to say that if the ‘order’ is too high then the filter may become unstable or inaccurate.  You’re going to have to take my word for it if I say the order of the filter is too high.  Second, we’re going to look at the ‘Group Delay’ of the filter.  This is a calculation that takes the Phase Response, corrects for the phase-vs-frequency relationship, and spits out the corresponding time delay.  In essence, if we had a hypothetical loudspeaker that had one drive unit for every frequency, ‘Group Delay’ would tell us how far forward or backward we would have to adjust the position of the drive unit – Wilson style – to correct for it.  The important thing here is the difference between corrected positions of the bass unit (the ‘lowest frequency driver’) and the 20kHz unit (the ‘highest frequency driver’).  I will call that the ‘Wilson Length’, which is the distance by which the tweeter position would have to be adjusted in order to correct for it.  This is the result that I will report.  I hope that makes sense.

We’ll start with a Butterworth filter for 24/88.2 PCM.  After doing my Pole Dance, what I come up with is a 31st-order filter, whose ‘Wilson Length’ is 14mm.  That 31st-order filter is a non-starter to begin with.  For 24/176.4 PCM the filter is 17th-order, which ought to be acceptable, and its Wilson Length is 3.5mm.  For 24/352.8 PCM, the filter is 12th-order, which is fine, and the Wilson Length is 1.3mm.  Given that experience with the Big Wilsons suggests that the Wilson Length needs to be optimized to within a fraction of a millimetre, it implies that the phase distortions of ALL of these filters could well result in audible deterioration of the perceived sound quality.

Type-II Chebyshev filters are the traditional workhorse for low-pass audio filters because they give good frequency response without requiring as high an order filter as the equivalent Butterworth.  For the three applications above, the filter orders workout to be 18th, 12th and 9th respectively, all of which ought to be acceptable.  Their Wilson Lengths work out to be 7.6mm, 1.8mm, and 0.6mm respectively.  In all, the Type-II Chebyshev filters seem to be slightly better than their Butterworth counterparts although without really knocking the Wilson Length parameter out of the park.  Only the 24/352.8 filter appears to have a shot at being ‘inaudible’.  Bear in mind, though, that the specific filter designs I described may not be optimal for those applications.  They were just chosen for illustrative purposes.

At this time it is instructive to look at the 16/44.1 variant of this filter.  With only 16-bits of bit depth we can reduce the attenuation requirement to 96dB, but with the Nyqvist frequency of 22kHz so close to the corner frequency of 20kHz this places great demands on the filter.  With a Butterworth design what we get is a 192nd-order filter which is a total non-starter.  With the Type-II Chebyshev it is a 44th-order filter, which, despite being a much smaller number is still of no practical value.  To get the level of performance we require will need what is called an Elliptic filter.  This can actually be achieved with an acceptable filter order, but an analysis of its ‘Wilson Length’ behaviour is both more complicated and in any case will be much poorer than any of the results obtained above.

The above analysis seriously reduces a complex subject to an unfairly simple catch-all number, but I think it has some value if taken on its own terms.  I hold the view that the sound of PCM is the sound of the anti-aliasing filters to which the source signal has to be subjected prior to being encoded in the PCM format.  We understand those filters very well, and in terms of frequency response we know that those filters ought to be inaudible, but we are less clear on whether their phase responses are in any way audible.  I personally suspect that the things we don’t like about PCM are the artifacts of the phase response of their anti-aliasing filters, which are baked into the signal.  If we are willing to accept at face value that (i) the ‘time-alignment’ capability of the Big Wilsons provides an audible and beneficial optimization; (ii) the underlying cause of such optimizations are changes in signal phase; and (iii) the amount of adjustment needed to bring the Big Wilsons into their ‘optimum alignment’ reflects the sensitivity of human brain to phase errors; then this would seem to be a good basis for arguing that the phase distortions induced by anti-aliasing filters are more than capable of adversely impacting the sound of PCM audio – particularly so in the 16/44.1 format.

I think that’s rather interesting.  While I recognize that there are a lot of broad sweeps and generalizations involved in all this, I think it has significant validity, provided it is confined to being taken on its own terms.

I want to conclude by commenting on ‘time alignment’ in the specific context of speaker design.  Clearly, if you apply the same signal to each drive unit of a loudspeaker, there can be only one unique position at which the two drive units are correctly time-aligned to one another.  Any other position would be, by definition, out of alignment.  So why offer the possibility of adjusting that alignment?  The answer lies in the caveat “… if you apply the same signal …”, because we don’t.  Different drive units receive different signals, each contoured to the drive unit’s needs by the loudspeaker’s crossover.  Crossovers are filters, and yes, they too have a phase response.  Those phase responses mean that there is usually no one fundamentally correct time alignment.  Wherever the alignment is set there are going to be some frequencies for which the alignment is ideal, and others for which it is less than ideal, and this may change with, for example, the relative listener position.  Whether or not an audibly optimum position exists at all will vary from speaker to speaker, according to its design.  So it doesn’t necessarily follow that you will be able to replicate the “Wilson Effect” by jerry-rigging some sort of alignment capability on your own loudspeakers, although, as I have mentioned in a previous post, simply tilting the speaker can have a surprising effect.  I suspect the speakers have to be designed from the ground up to take full advantage of this design approach.

An opinion piece this week designed to get your backs up and make you think.  You read a lot of brouhaha these days about how musicians are not making any money out of streaming services.  There are so many streaming services available – some even offer high-resolution lossless content – and much like Netflix in the video domain, we as consumers can now access a lot of content for a nominal (i.e. affordable) outlay.  How, you might wonder, can the musicians who create the music in the first place be making any money out of it?

Recently, a study has been doing the rounds which purports to analyze the revenues of the streaming service Spotify, and indicates how that revenue is divvied up among the Streaming Service itself, the Record Labels, the Writers/Composers, and the Artists.  The report is available on the web site of Music Business World, and was prepared by the accounting firm Ernst & Young, so it has at least a minimum acceptable level of credibility.

Ask yourself this – according to the report, for every dollar you spend on Spotify, just how much of it ends up in the pocket of the artist whose music you are listening to?  Before you go on to read the the answer, I want you to ponder the issue for a moment and ask yourself how much you think OUGHT to go to the artist?  Also, stop for a moment to consider the rationale behind your calculation, so that it is a little bit more than a number you pulled out of thin air.  On what basis should the artist receive whatever it was that you thought was appropriate?

So what did you come up with…?  50 cents?  20 cents?  10 cents?  The actual answer is less than 7 cents.  Not seven cents every time you listen to a track, but 7 cents out of every dollar you spend.  If you subscribe to their premium service that’s about $10 a month.  So your subscription to Spotify generates 70 cents a month to be shared among all of the artists that you listen to.  Lets imagine that you listen to 20-25 tracks a day, and lets assume that the money gets split evenly among the artists on a per-play basis.  In that scenario you are playing about 700 tracks a month.  So each time you play a track, the artist you are listening to earns something like one tenth of one cent.

In some circles, this has aroused the anger of musicians who feel that the Spotifys of this world are screwing them out of their rightful earnings.  First Napster, then bit torrents and file sharing, and now this!

There are two problems with this.  The first is that, as best as anyone can tell, none of these streaming services are actually making any money!  It is one thing to argue a case against someone who is making scads of money off the backs of others, but another thing entirely to vent your spleen at someone who isn’t even profitable – unless your complaint is about the lack of any profit itself, which isn’t the case here.

Can it really cost that much money to run Spotify?  Which brings me to the second problem.  What happens to all the money that you pay to Spotify?  The answer is that Spotify in turn pays the majority of it in fees to the labels.  Spotify pays about 17 cents on the dollar in taxes, and uses 20 cents to run its own operations.  The rest – amounting to nearly two-thirds of their revenues – is paid directly to the Record Labels who manage the distribution to the Artists.  In other words, Spotify doesn’t get to decide how much of their take goes to the Artists – that is entirely within the purview of the Record Labels themselves.

So now, lets take a look at the money that the Labels receive.  How do they distribute that?  According to the Music Business World report only 10% of what they receive goes to the Artists, and 15% goes to the Songwriters and Publishers, which means that the Labels keep a whopping 75% of the pie for themselves.  That’s a lot of pie.

It is therefore wrong-headed for the Angry Artists to get all stroppy about Spotify eating their lunch.  It is the Labels who are doing all the munching.  And it has been thus for as long as there has been a music industry.  But, the argument goes, it is a different world in 2015.  Labels used to have to pay for record stamping plants, or even CD stamping plants.  They has to maintain a sales force to get their product stocked by the music stores, and a promotional force to get their customers into the stores.  Plus the costs of transporting the product internationally.  Today this doesn’t happen any more.  All of the above is theoretically replaced by an “Upload” button that someone has to punch.

But even taking all of that into account, it still misses the point entirely for the Artists to be taking pot shots at the Labels.  If the Artist feels that the Label is charging too much for what they provide, then their solution is simple – they don’t have to sign with a Label.  Like just about any transaction, if you don’t like the price, you don’t have to make the purchase.  Unfortunately, though, for the majority of Artists, they don’t even have the option to not sign for a Label.  The reality is that as an Artist you hope to generate enough buzz that a Label – any freakin’ Label will do! – will deign to offer you a deal.  The idea that you can shop around and choose the one that offers you the best deal is a pipe dream for all but the privileged few.

For the Artist, what are the alternatives?  The obvious one is that they can start their own Label.  Sure they can … there’s nothing to stop them.  Well, except one thing.  You’ll need some money.  And as an Artist without a record deal capable of putting one tenth of a cent in your pocket every time someone plays one of your tunes on Spotify, you won’t actually have any money.

The view from the other side of the fence is not all roses either.  As a Label, you are hopefully making money from your roster of Artists.  But they come and go, as do their sales.  You always need to be replenishing your portfolio.  For every new Artist that you have a budget to take on there are a hundred who are convinced that they are The One.  You need to be really smart about which ones you sign and which ones you pass on.  After all, you’re not as dumb as the Decca executive who passed on The Beatles because guitar music was going out of style, are you?

Once you’ve signed a new Artist you are going to need to pay for some studio time to record their new album.  You’ll need to pay people to design the cover work and take publicity shots.  You’ll probably need professional video work doing.  You’ll need to schedule radio and TV spots if you’re sufficiently gung-ho about their prospects.  And you’ll need to cut that deal with Spotify.  All that expense must be incurred without any guarantee that you’ll ever get a penny in return.  And for every Artist who generates a handy revenue stream for you, there will be four or five who fail to make any sort of impact at all.  On top of it all, it may be you who screws up.  The Artist may leave you and sign for another Label, and under their guidance hit the big time.

For this reason, most Labels are very controlling when it comes to their stable of Artists.  They will control a large part of the product, how it sounds, whose arrangements are used – they’ll even kick out members of the band and bring in better session musicians.  If they don’t like your songs they’ll use their own songwriters.  The Labels are in the business of knowing what will sell and what won’t.  They won’t always get it right, but like a profession stock trader, they’ll get it right way more often than you will.  Even the poor sod at Decca who turned down The Beatles (his name was Dick Rowe) went on to sign The Rolling Stones.  Consequently, the Artists very soon find out exactly where on the totem pole a place has been reserved for them, even as their backs are being patted and their egos inflated.

So, as a musician, if you can do all that then you don’t need the services of a Label, and you’ll make ten times as much from Spotify as you might otherwise have done.  If not, then you have little choice but to work within the established Label system if you can get one sufficiently interested.  Otherwise, as one certain Norman Tebbitt might have put it, you should consider getting ‘on yer bike’ and finding a proper job :)

Here’s the thing about musicians in particular, but Artists generally.  You are only an Artist while you are creating art for your own personal satisfaction.  As soon as you aim to sell it for even a modest profit you become a businessperson, no different from a restaurant owner or a plumber.  Its a dog-eat-dog world, whether you’re selling art or amplifiers, and you need to have a minimum of business savvy if you are going to survive in it.  You need to identify smart things to do and dumb things to avoid doing.  The world has little sympathy for poor businesspeople.  And it won’t pay $10 for something if there is something else it thinks might be just as good priced at $9.95.  Don’t take my word for it.  Spend some time in Walmart.

My advice to musicians who fret about how much they are getting from Spotify is simple.  You are businesspeople first and foremost, and you had better start looking at yourselves in that light.  Would you open a paint store that only sold green paint?  I know I wouldn’t.  The thing about business is that sometimes the best thing to do is not the same as the thing you really wanted to do.  If you can’t – or won’t – see that, then your prospects for success will have a lot in common with buying a lottery ticket.  Which is fine, because most of us do not make particularly good businesspeople, and rarely win the lottery.  In which case you should go back to being an artist, and create art for no purpose other than your own satisfaction – in your spare time of course, since you’ll have a ‘proper’ job to do as well.

I have been using iTunes 12.1.0 for a couple of days now.  It seems to work fine with BitPerfect.  The only issues I am aware of are those which also affected previous versions of iTunes and represent mostly edge cases and minor inconveniences.  BitPerfect user can feel comfortable upgrading from iTunes 12.0.x.

As support for regular DSD (aka DSD64) starts to become close to a requirement for manufacturers of not only high-end DACs, but also a number of entry-level models too, so the cutting edge of audio technology moves ever upward to more exotic versions of DSD denoted by the terms DSD128, DSD256, DSD512, etc.  What are these, why do they exist, and what are the challenges faced in playing them?  I thought a post on that topic might be helpful.

Simply put, these formats are identical to regular DSD, except that the sample rate is increased.  The benefit in doing so is twofold.  First, you can reduce the magnitude of the noise floor in the audio band.  Second, you can push the onset of undesirable ultrasonic noise further away from the audio band.

DSD is a noise-shaped 1-bit PCM encoding format (Oh yes it is!).  Because of that, the encoded analog signal can be reconstructed simply by passing the raw 1-bit data stream through a low-pass filter.  One way of looking at this is that at any instant in time the analog signal is very close to being the average of a number of consecutive DSD bits which encode that exact moment.  Consider this: the average of the sequence 1,0,0,0,1,1,0,1 is exactly 0.5 because it comprises four zeros and four ones.  Obviously, any sequence of 8 bits comprising four zeros and four ones will have an average value of 0.5.  So, if all we want is for our average to be 0.5, we have many choices as to how we can arrange the four zeros and four ones.

That simplistic illustration is a good example of how noise shaping works.  In effect we have a choice as to how we can arrange the stream of ones and zeros such that passing it through a low pass filter recreates the original waveform.  Some of those choices result in a lower noise floor in the audio band, but figuring out how to make those choices optimally is rather challenging from a mathematical standpoint.  Theory, however, does tell us a few things.  The first is that you cannot just take noise away from a certain frequency band.  You can only move it into another frequency band (or spread it over a selection of other frequency bands).  The second is that there are limits to both how low the noise floor can be depressed at the frequencies where you want to remove noise, and how high the noise floor can be raised at the frequencies you want to move it to.

Just like digging a hole in the ground, what you end up with is a low frequency area where you have removed as much of the noise as you can, and a high frequency area where all this removed noise has been piled up.  If DSD is to work, the low frequency area must cover the complete audio band, and the noise floor there must be pushed down by a certain minimum amount.  DSD was originally developed and specified to have a sample rate of 2,822,400 samples per second (2.8MHz) as this is the lowest convenient sample rate at which we can realize those key criteria.  We call it DSD64 because 2.8224MHz is exactly 64 times the standard sample rate of CD audio (44.1kHz).  The downside is that the removed noise starts to pile up uncomfortably close to the audio band, and it turns out that all the optimizing in the world does not make a significant dent in that problem.

This is the fundamental limitation of DSD64.  If we want to move the ultrasonic noise further away from the audio band we have to increase either the bit depth or the sample rate.  Of the two, there are, surprisingly enough, perhaps more reasons to want to increase the bit depth than the sample rate.  However, these are trumped by the great advantages in implementing an accurate D/A converter if the ‘D’ part is 1-bit.  Therefore we now have various new flavours of DSD with higher and higher sample rates.  DSD128 has a sample rate of 128 times 44.1kHz, which works out to about 5.6MHz.  Likewise we have DSD256, DSD512, and even DSD1024.

Of these, perhaps the biggest bang for the buck is obtained with DSD128.  Already, it moves the rise in the ultrasonic noise to nearly twice as far from the audio band as it was with DSD64.  Critical listeners – particularly those who record microphone feeds direct to DSD – are close to unanimous in their preference for DSD128 over DSD64.  The additional benefits in going to DSD256 and above seem to be real enough, but definitely fall into the realms of diminishing returns.  However, even though the remarkably low cost and huge capacity of hard disks today makes the storage of a substantial DSD library a practical possibility, if this library were to be DSD512 for example, this would start to represent a significant expense in both disk storage and download bandwidth costs.  In any case, as a result of all these developments, DSD128 recordings are now beginning to be made available in larger and larger numbers, and very occasionally we get sample tracks made available for evaluation in DSD256 format.  However, at the time of writing I don’t know where you can go to download samples of DSD512 or higher.

In the Apple World where BitPerfect users live, playback of DSD requires the use of the DoP (“DSD over PCM”) protocol.  This dresses up a DSD bitstream in a faux PCM format, where a 24-bit PCM word comprises 16 bits of raw DSD data plus an 8-bit marker which identifies it as such.  Windows users have the ability to use an ASIO driver which dispenses with the need for the 8-bit marker and transmits the raw DSD data directly to the DAC in its “native” format.  ASIO for Mac, while possible, remains problematic.

As mentioned, DoP encoding transmits the data to the DAC using a faux PCM stream format.  For DSD64 the DAC’s USB interface must provide 24-bit/176.4kHz support, which is generally not a particularly challenging requirement.  For DSD128 the required PCM stream format is 24-bit/352.8kHz which is still not especially challenging, but is less commonly encountered.  But if we go up to DSD256 we now have a requirement for a 24-bit/705.6kHz PCM stream format.  The good news is that your Mac can handle it out of the box, but unfortunately, very few DACs offer this.  Inside your DAC, if you prise off the cover, you will find that the USB subsystem is separate from the DAC chip itself.  USB receiver chipsets are sourced from specialist suppliers, and if you want one that will support a 24/705.6 format it will cost you more.  Additionally, if you are currently using a different receiver chipset, you may have a lot of time and effort invested in programming it, and you will have to return to GO if you move to a new design (do not collect $200).  The situation gets progressively worse with higher rate DSD formats.

Thus it is that we see examples of DSD-compatible DACs such as the OPPO HA-1 which offers DSD256 support, but only in “native” mode.  What this means is that if you have a Mac and are therefore constrained to using DoP, you need access to a 24/705.6 PCM stream format in order to deliver DSD256, and the HA-1 has apparently been designed with a USB receiver chipset that does not support it.  It may not be as simple as that, and there may be other considerations at play, but if so I am not aware of them.

Interestingly, the DoP specification does offer a workaround for precisely this circumstance.  It provides for an alternative to a 2-channel 24/705.6 PCM format using a 4-channel 24/352.8 PCM format.  The 8-bit DoP marker specified is different, which enables the DAC to tell 4-channel DSD128 from 2-channel DSD256 (they would otherwise be identical).  Very few DAC manufacturers currently support this variant format.  Mytek is the only one I know of – as I understand it their 192-DSD DAC supports DSD128 using the standard 2-channel DoP over USB, but using the 4-channel variant DoP over FireWire.

Because of its negligible adoption, BitPerfect currently does not support the 4-channel DoP variant.  If we did, it would require some additional configuration options in the DSD Support window.  I worry that such options are bound to end up confusing people.  For example, despite what our user manual says, you would not believe the number of customers who write to me because they have checked the “dCS DoP” checkbox and wonder why DSD playback isn’t working!  Maybe they were hoping it would make their DACs sound like a dCS, I dunno.  I can only imagine what they will make of a 2ch/4ch configurator!!!

As a final observation, some playback software will on-the-fly convert high-order DSD formats which are not supported by the user’s DAC to a lower-order DSD format which is.  While this is a noble solution, it should be noted that format conversion in DSD is a fundamentally lossy process, and that all of the benefits of the higher-order DSD format – and more – will as a result be lost.  In particular, the ultrasonic noise profile will be that of the output DSD format, not that of the source DSD format.  Additionally, DSD bitstreams are created by Sigma-Delta Modulators.  These are complex and very challenging algorithms which are seriously hard to design and implement successfully, particularly if you want anything beyond modest performance out of them.  The FPGA-based implementation developed for the PS Audio DirectStream DAC is an example of a good one, but there are some less-praiseworthy efforts out there.  In general, you will obtain audibly superior results pre-converting instead to PCM using DSD Master.

In mathematics, the word ‘convolution’ describes a very important class of manipulations.  If you want to know more about it, a pretty good treatment is shown on its Wikipedia page.  And even if you don’t, I am going to briefly summarize it here before going on to make my point :)

A convolution is an operation performed on two functions, or on two sets of data.  Typically (but not always) one is the actual data that we are trying to manipulate, and the other is a weighting function, or set of weights.  Convolution is massively important in the field of signal processing, and therefore is something that anybody who wants (or needs) to talk knowledgeably about digital audio needs to bone up on.  The most prominent convolution processes that you may have heard of are Fourier Transforms (which are used to extract from a waveform its audio spectrum) and digital filtering.  It is the latter of those that I want to focus on here.

In very simple terms, a filter (whether digital or analog) operates as a convolution between a waveform and an impulse response.  You will have heard of impulse responses, and indeed you may have read about them in some of my previous posts.  In digital audio, an impulse response is a graphical representation of the ‘weights’ or ‘coefficients’ which define a digital filter.  Complicated mathematical relationships describe the way in which the impulse response relates to the key characteristics of the filter, and I have covered those in my earlier posts on ‘Pole Dancing’.

Impulse responses are therefore very useful.  They are nice to look at, and easy to categorize and classify.  Unfortunately, it has become commonplace to project the aesthetic properties of the impulse response onto the sonic properties arising from the filter which uses it.  In simple language, we see a feature on the impulse response, and we imagine that such a feature is impressed onto the audio waveform itself after it comes out of the filter.  It is an easy mistake to make, since the convolution process itself is exactly that – a mathematical impression of the impulse response onto the audio waveform.  But the mathematical result of the convolution is really not as simple as that.

The one feature I see misrepresented most often is pre-ringing.  In digital audio, an impulse is just one peak occurring in a valley of flat zeros.  It is useful as a tool to characterize a filter because it contains components of every frequency that the music bit stream is capable of representing.  Therefore if the filter does anything at all, the impulse is going to be disturbed as a result of passing through it.  For example, if you read my posts on square waves, you will know that removing high frequency components from a square wave results in a waveform which is no longer square, and contains ripples.  Those ripples decay away from the leading edge of the square wave.  This is pleasing in a certain way, because the ripples appear to be caused by, and arise in response to, the abrupt leading edge of the square wave.  In our nice ordered world we like to see effect preceded by cause, and are disturbed by suggestions of the opposite.

And so it is that with impulse responses we tend to be more comfortable seeing ripples decaying away after the impulse, and less comfortable when they precede the impulse, gathering in strength as they approach it.  Our flawed interpretation is that the impulse is the cause and the ripples the effect, and if these don’t occur in the correct sequence then the result is bound to be unnatural.  It is therefore common practice to dismiss filters whose impulse response contains what is termed “pre-ringing” because the result of such filters is bound to be somewhat “unnatural”.  After all, in nature, effects don’t precede their cause, do they?

I would like you to take a short break, and head over to your kitchen sink for a moment.  Turn on the tap (or faucet, if you prefer) and set the water flow to a very gentle stream.  What we are looking for is a smooth flow with no turbulence at all.  We call this ‘laminar’ flow.  What usually happens, if the tap outlet is sufficiently far above the bottom of the sink is that the laminar flow is maintained for some distance and then breaks up into a turbulent flow.  The chances are good that you will see this happening, but it is no problem if you don’t – so long as you can find a setting that gives you a stable laminar stream.  Now, take your finger, and gently insert it into the water stream.  Look closely.  What you will see are ripples forming in the water stream **above** your finger.  If you don’t, gradually move your finger up towards the tap and they should appear (YTMV/YFMV).  What you will be looking at is an apparently perfect example of an effect (the ripples) occurring before, or upstream of, the cause (your finger).

What I have demonstrated here is not your comfortable world breaking down before your eyes.  What is instead breaking down is the comfort zone of an over-simplistic interpretation of what you saw.  Because the idea of the finger being the cause and the ripples being the effect is not an adequate description of what actually happened.

In the same way, the notion of pre-ringing in the impulse response of a filter resulting in sonic effects that precede their cause in the resultant audio waveform, is not an adequate description of what is happening.  However, the misconception gains credence for an important, if inconvenient reason, which is that filters which exhibit pronounced pre-ringing do in fact tend to sound less preferable than those which don’t.  These sort of things happen often in science – most notably in medical science – and when it does it opens a door to misinformation.  In this case, the potential for misinformation lies in the reason given for why one filter sounds better than another – that the one with pre-ringing in its impulse response results in sounds that precede the things which caused them.  By all means state your preference for filters with a certain type of impulse response, but please don’t justify your preference with flawed reasoning.  It is OK to admit that you are unclear as to why.

I want to finish this with an audio example to make my point.  The well-known Nyquist-Shannon theory states that a regularly sampled waveform can be perfectly recreated if (i) it is perfectly sampled; and (ii) it contains no frequency components at or above one half of the sample rate.  The theory doesn’t just set forth its premise, it provides a solid proof.  In essence, it does this by convolving the sampled waveform with a Sinc() function, in a process pretty much identical to the way a digital filter convolves the waveform with an Impulse Response.  Nyquist-Shannon proves that this convolution results in a mathematically perfect reconstruction of the original waveform if – and only if – the two stipulations I mentioned are strictly adhered to.  This is interesting in the context of this post because the Sinc() function which acts as the Impulse Response exhibits an infinitely long pre-ring at a significantly high amplitude.  Neither Nyqvist or Shannon, nor the entire industry which their theory spawned, harbour any concerns about causality in reconstructed waveforms!

Here is a short video that should give pause to those who have asked that question with the confident skepticism of someone who has never tried to actually make a pair themselves. This person has made his own pair of B&W 800 Diamond loudspeakers. Has he succeeded? We will never know, but it sure looks most impressive.

In practice, he has restricted himself to making his own set of elaborate cabinets, as it looks as though he has bought all the drive units from B&W. But even so, the overwhelming impression is of the expensive resources he has had to bring to bear to realize the project. OK, he has done the grunt work himself, but the project has clearly taken a HUGE amount of time and effort. Aside from some initial consternation, I imagine that the executives at B&W are having a good chuckle over it.

Presumably his motivation was purely the satisfaction of creating his own work of art. Think about it. How much money can he possibly have saved by doing it himself? Do you think you could do it yourself for less, without sacrificing at least some of the core design objectives?

Whatever, as I contemplate my own B&W 802 Diamonds, I am sure glad I bought mine!

https://www.youtube.com/watch?v=fHgdNQkiNVk

The SACD format was built around DSD right from the start.  But since DSD takes up about four times the amount of disk space of a 16/44.1 equivalent this meant that a different physical disc format was going to be required.  Additionally, SACD was specified to deliver multi-channel content, which increases the storage requirement by another factor of 3 or more, depending on how many channels you want to support.  The only high-capacity disc format that was on the horizon at the time was the one eventually used for DVD, and even this was going to be inadequate for the full multi-channel capability required for SACD.

The solution was to adopt a lossless data compression protocol to reduce the size of a multi-channel DSD master file so that it would fit.  This protocol chosen was called DST, and is an elaborate DSP-based method derived from the way MP3 works.  Essentially, you store a bunch of numbers that represent the actual data as a mathematical function which you can later use to try to re-create the original data.  You then store a bunch of additional numbers which represent the differences between the actual data and the attempted recreation.  If you do this properly, the mathematical function numbers, plus the difference data, takes up less space than the original data.  On a SACD the compression achieved is about 50%, which is pretty good, and permits a lot of content to be stored.

Given that DST compression is lossless, it is interesting that the SACD format allows discs to be masted with your choice of compressed or non-compressed data.  And, taking a good look at a significant sample of SACDs, it appears that a substantial proportion of those discs do not use compression.  Additionally, if you look closely, you will see that almost all of the serious audiophile remasters released on SACD are all uncompressed.  So the question I have been asking is – is there any reason to believe that DST-compressed SACDs might sound worse than uncompressed ones?

First of all, let me be clear on one thing.  The DST compression algorithm is lossless.  This means that the reconstructed waveform is bit-for-bit identical to the original uncompressed waveform.  This is not at issue here.  Nor is the notion that compressing and decompressing the bits somehow stresses them so that they don’t sound so relaxed on playback.  I don’t buy mumbo jumbo.  The real answer is both simpler than you would imagine (although technically quite complicated), and at the same time typical of an industry which has been known to upsample CD content and sell it for twice the price on a SACD disc.

To understand this, we need to take a closer look at how the DSD format works.  I have written at length about how DSD makes use of a combination of massive oversampling and noise shaping to encode a complex waveform in a 1-bit format.  In a Sigma-Delta Modulator (SDM) the quantization noise is pushed out of the audio band and up into the vast reaches of the ultrasonic bandwidth which dominates the DSD encoding space.  The audio signal only occupies the frequency space below 20kHz (to choose a number that most people will agree on).  But DSD is sampled at 2,822kHz, so there is a vast amount of bandwidth between 20kHz and 2,822kHz available, into which the quantization noise can be swept.

One of the key attributes of a good clean audio signal is that it have low noise in the audio band.  In general, the higher quality the audio signal, the lower the noise it will exhibit.  The best microphones can capture sounds that cannot be fully encoded using 16-bit PCM.  However, 24-bit PCM can capture anything that the best microphones will put out.  Therefore if DSD is to deliver the very highest in audio performance standards it needs to be able to sustain a noise floor better than that of 16-bit audio, and approaching that of 24-bit audio.

The term “Noise Shaping” is a good one.  Because quantization noise cannot be eliminated, all you can hope to do is to take it from one frequency band where you don’t want it, and move it into another where you don’t mind it – and in the 1-bit world of DSD there is an awful lot of quantization noise.  This is the job of an SDM.  The design of the SDM determines how much noise is removed from the audio frequency band, and where it gets put.  Mathematically, DSD is capable of encoding a staggeringly low noise floor in the audio band.  Something down in the region of -180dB to -200dB has been demonstrated.  What good DSD recordings achieve is nearer to -120dB, and the difference is partly due to that fact that practical real-world SDM designs seriously underperform their theoretical capabilities.  But it also arises because better performance requires a higher-order SDM design, and beyond a certain limit high-order SDMs are simply unstable.  A workmanlike SDM would be a 5th-order design, but the best performance today is achieved with 8th or 9th order SDMs.  Higher than that, and they cannot be made to work.

So how does a higher-order SDM achieve superior performance?  The answer is that it packs more and more of the quantization noise into the upper reaches of the ultrasonic frequency space.  So a higher-performance higher-order SDM will tend to encode progressively more high-frequency noise into the bitstream.  A theoretically perfect SDM will create a Bit Stream whose high frequency content is virtually indistinguishable from full-scale white noise.

This is where DST compression comes in.  Recall that DST compression works by storing a set of numbers that enable you to reconstruct a close approximation of the original data, plus all of the differences between the reconstructed bit stream and the original bit stream.  Obviously the size of the compressed (DST-encoded) file will be governed to a large degree by how much data is needed to store the difference signal.  It turns out that the set of numbers that reconstruct the ‘close approximation’ do a relatively good job of encoding the low frequency data, but a relatively poor job of encoding the high frequency data.  Therefore, the more high frequency data is present, the more additional data will be needed to encode the difference signal.  And the larger the difference signal, the larger the compressed file will be.  In the extreme, the difference signal can be so large that you will not be able to achieve much compression at all.

This is the situation we are in with today’s technology.  We can produce the highest quality DSD signal and be unable to compress it effectively, or we can accept a reduction in quality and achieve a useful degree of (lossless) compression.

So what happens when we have a nice high-resolution DSD recording all ready to be sent to the SACD mastering plant?  What happens if the DSD content is too large to fit onto a SACD, and cannot be compressed enough so that it does?  The answer will disappoint you.  What happens is that the high quality DSD master tape is remodulated using a modest 5th-order SDM, in the process producing a new DSD version which can now be efficiently compressed using DST compression.  Most listeners agree that a 5th order SDM produces audibly inferior sound to a good 8th order SDM, but with real music recordings it is essentially impossible to inspect a DSD data file and determine unambiguously what order of SDM was used to encode it.  So it is easy enough to get away with.

How do you tell if a SACD is compressed or not?  Well, if you have the underground tools necessary, you can rip it and analyze it definitively.  For the rest of us there is no sure method except for one.  You simply add up the total duration of the music on the disc, and calculate 2,822,400 bits of data per second, per channel.  If the answer amounts to more than 4.7GB then the data must be compressed.  If it adds up to less, there is no guarantee that it won’t be DST-compressed, but the chances are pretty good that it is not.  After all, if the record company wants to compress it, they’d have to pay someone to do that, and that probably ain’t gonna happen.  The other simple guideline is that if it is multi-channel it is probably compressed, but if it is stereo it probably is not.

Of course, none of this need apply to downloaded DSD files.  If produced by reputable studios these will have been produced using the best quality modulators they can afford, and since DST encoding is not used on commercial DSF and DFF* files this whole issue need not arise.  However, if the downloaded files are derived from a SACD (as many files are which are not distributed by the original producers), then the possibility does exist that you are receiving an inferior 5th-order remodulated version.  The take-away is that not all DSD is created equal.  Yet another thing for us to have to bear in mind!

[* Actually, the DFF file format does allow for the DSD content to be DST compressed, because this format is used by the mastering house to provide the final distribution-ready content to the SACD disc manufacturing plant.  However, for commercial use, I don’t think anybody employs DST compression.]