Nothing whatsoever to do with audio, this post. Question: How do you know when an Italian is speaking? Answer: His arms are moving!
That’s a stereotype as old as the hills, and one with a lot of truth to it. The fact that people – of all nationalities – tend to wave their arms and gesture as they speak is something that has always fascinated me. The fact that I do it myself, despite being stolidly Anglo-Saxon, fascinates me just as much. I find this aspect of human behaviour endlessly amusing. For example, what is the purpose of gesturing exorbitantly while talking on your cell phone? Why do I do it myself from time to time?
This morning, while working out on my cross-trainer, I think I came up with a crazy insight, and I thought I would share it with you. On the TV set was a travelogue show. The host was taking a leisurely stroll with a guest, discussing the history of Buenos Aires, as it happens. The guest was doing most of the talking. At the beginning his arms remained at his sides. Then they began to gradually lift in front of him as he spoke, and finally began to adopt mild gestures.
I considered the mechanics of walking, talking, and gesturing. What else is there to do when you’re stuck on a treadmill? Lets start with talking. In order to talk, you need to establish an overpressure in your diaphragm to drive the vibrations in your vocal chords. The process of tensing your diaphragm involves tensing your abdominal muscles. Try saying something out loud right now, and note how your diaphragm and abs both tense up. When you are standing up, and also when you are walking slowly, your abdominal muscles are also part of the process of staying in balance. They will be more tense, in general, than when you are sitting down.
So now imagine you are standing up, maybe even walking, and decide you are going to say something. The first thing that happens is that your diaphragm tenses up to supply an overpressure. This requires your abs to tighten slightly. The tightening of your abs causes your upper body to want to bend slightly forward. But you don’t want to tip forward, so your autonomous nervous system automatically compensates by raising your arms in front of you. The angular momentum of your arms rising in front of you counterbalances the angular momentum of your upper body bending forward, and this balance means that you don’t tip over.
Now you start to actually speak. This involves temporarily reducing the overpressure in your diaphragm to allow a controlled release of air through the vocal chords. The reduced overpressure is accomplished, at least partially, by releasing the tension in the abs. This then releases the forward bend in the upper body. The raised arms now need to begin to lower again to provide the angular momentum to counterbalance it.
So here is the summary of what I have just described. When a person starts the process of speaking, his arms first come up. With each utterance the arms gesture forwards again, and in the pauses between utterances come back up again. When the speaking is over, the arms can come back down.
What about the TV guest in Buenos Aires? Well, I think that it all boils down to your core body strength and endurance. If you are in good shape, and particularly if your core is in good shape, your body is less likely to tilt in response to a slight tightening of the abs. Your back and other core muscles will tend to compensate automatically. But if not, then as you stroll slowly along, chatting as you go, your abs are being lightly exercised, and after a while your core muscles will gradually tire, and you will need to use your arms to assist. This is what happened to the Argentinian, who did not give the impression of being particularly buff. At the start of his short stroll, he needed no arm assist. Then, as he tired, his arms would raise – barely so – as he spoke. By the end of the chat, his arms were all the way up, and he was gesturing with each utterance.
My thought is that the root of gesturing as we speak – which is common to all cultures and not just Italians – must lie in some sort of bio-mechanical response such as this. I thought that was a pretty cool idea. Any anthropologists reading this?
Positioning your loudspeakers in your listening room for optimum performance is an arcane art. Three factors must be taken into account. First, you want to avoid exciting your listening room’s natural bass resonances; Second, you want to throw a good and accurate stereo image; and Third, there will be any number of purely practical considerations that you cannot avoid and have to work around – for example, it’s best if you don’t block the doorway.
The first of these factors is well understood, although not to the extent that the correct answer can be exactly derived a priori. The final solution will depend on the acoustic properties of each of the listening rooms walls, floor, and ceiling, as well as the speaker’s dispersion pattern, and not forgetting all of the furnishings. There is a commonly adopted tool called the Rule of Thirds, where the speakers are each placed one third of the way into the room from the side wall, and one third of the way in from the back wall. The listener then sits one third of the way in from the wall behind him. This is usually a good place to start. A variant of this rule is the Rule of Fifths, which is pretty much the same with all the “Thirds” replaced by “Fifths”. But this post is not about this aspect, so lets move on.
The third factor is also something that I cannot help you with in a FaceBook post. You had best check out Good Housekeeping, or something. So this post is going to focus on the second factor, obtaining a good stereo image – indeed this post is about one very specific aspect of it.
It turns out that, of all the factors via which speaker positioning affects the creation of a solid stereo image, the most important is usually the so-called First Reflection. Sound leaving the loudspeaker cones makes its way to your ear either directly, or by reflection off the walls, floor, and ceiling. The sound can take many paths, bouncing all round the room, but the most important path is the one that travels from speaker to ear via a bounce off one surface only. In most listening rooms, the ceiling is generally flat and unobstructed and is the same across the whole room. Therefore the sound from either speaker will bounce off the ceiling in a similar manner on its journey to your ear. As a consequence it does not generally impact the stereo image, at least to a first approximation. The same can be said for the floor, although in most situations furniture and carpeting can interrupt the path. However, the walls affect the speakers asymmetrically. The right speaker is close to the right wall but far from the left wall, so its First Reflection will be dominated by reflections off the right wall. The opposite will be the case for the left speaker. This asymmetry is partly responsible for the perceived stereo imaging of the system.
There are two things the user can use to control these First Reflections from the side walls. The first is to adjust the proximity of the speaker to the side wall. The closer it is, the less the time delay between the arrival at the ear of the direct signal and the reflected signal. The second is to adjust the speaker’s toe-in (the extent to which the speaker points towards the centre of the room rather than straight along the room’s axis). Unless you have a true omnidirectional loudspeaker, the speaker’s horizontal dispersion pattern will peak at its straight-ahead position, gradually falling off as you move to one side or the other. Therefore, the amount of toe-in controls the proportion of the reflected signal to the direct signal. If your listening room has sonically reflective side walls (plain painted walls, for example), you will probably require a greater degree of toe-in than if you have heavily textured wallpapered side walls, or furniture that will scatter the sound as opposed to reflecting it (such as bookshelves).
I have attached two photographs of my own listening room. The side walls are flat and painted, and are quite reflective, therefore my loudspeakers have quite a large degree of toe-in. Also, along the wall beside one of them I have a glass door close to the speaker. With the door closed, the First Reflection comes off a combination of the wall and the glass door. However, with the door open, the First Reflection comes off a combination of the glass door and a big hole (which clearly does not reflect at all). Therefore, on my system, the imaging is severely impacted if I listen with the door open.
The other thing you need to bear in mind is that your best strategy is to control these First Reflections rather than work to merely eliminate them. As a rule, placing highly absorbent panels right where the Reflection is going to strike is not going to help the sound too much. The fact that reflections are so important generally means that you don’t want your room to be too acoustically dead. An empty room with painted flat walls can have a horrible echoing acoustic, but it only takes a small amount of furnishing to break it up. The echoing or “liveliness” of a room is usually measured by a property called RT60. This is the time it takes for a reverberation in the room (caused, for example, by clapping your hands) to fall to 60dB below its initial value. A good number for a listening room would be 0.3 – 0.5 seconds. If your room has a larger RT60 value, then you will probably need to deaden it with a judiciously placed acoustic panel. But how big of a panel, and where to place it, is a very complicated subject in itself. My room has a big absorbing panel, about 6’ x 4’, affixed to the ceiling between and behind the speakers. I also prefer to listen with the heavy floor-to-ceiling curtains on the wall behind my listening chair drawn.
Of course, every time you make a significant change to the acoustics of your listening room, the chances are good that you are going to need to reposition your speakers. Changes that affect the RT60 may well impact the optimum positioning, so you may have to go through the whole procedure again. Reposition, then fine-tune the toe-in and the tilt. My B&W 802 Diamonds weigh 160lb each, and are the most cussedly awkward things to grasp if you ever want to move them, so that is something I don’t like to get involved with on a whim. Because of the First Reflection factor, if your listening room is such that the First Reflection surface has a high acoustic reflectivity, then be aware that the distance of the speaker from the side wall will probably have to be set to an accuracy of half an inch. Likewise, the toe-in and tilt can require great precision for optimal results.
If your loudspeakers are not set up to image as well as they can, then you are going to find it that much harder to optimize other aspects of your system setup.
There was a time – and this may surprise you – when a Hi-Fi reviewer’s job was to install whatever he was reviewing on his lab bench and measure the bejesus out of it. When I first got into Hi-Fi, in England back in the 1970’s, one of the senior reviewers in the vibrant Hi-Fi magazine scene was Gordon J. King. Gordon lived close by and I got to meet him. Gordon would never dream of connecting an amplifier to a pair of speakers and playing music through it. He would measure power output, distortion, frequency response, anything he could put a number to. But he would never let it near his sound system (which was pretty weird, and which he never did let me listen to).
When Naim released the radical 250 Power Amplifier, Julian Vereker didn’t pause to think before sending one out to Gordon for review. Now, the first 250’s had a tendency to oscillate in the ultrasonic without an inductive load. In fact, the user manual went to great lengths to specify the type of speaker cable which was necessary to avoid this problem in practice. Not a person to pay any attention to matters as mundane as loudspeaker cables, Gordon immediately installed the 250 on his lab bench and connected a rheostat across its output terminals, which, for the duration of his test, was all he would ever connect to it. Needless to say, it measured terribly, right up until he blew out its output stage measuring its power delivery capability. It was sent back to a horrified Julian Vereker, who repaired it and sent it back. It blew up for a second time. Gordon gave the Naim 250 a terrible review.
At one point, after he had retired, Gordon gave me a high-end Denon receiver, a product he considered one of the best amplifiers he had ever reviewed. That Denon sounded absolutely appalling when I hooked it up. I gave it back. As life would have it, I replaced the Denon with … a Naim 250. It was absolutely superb sounding.
A few years earlier, “Hi-Fi Answers” was one of the many UK Hi-Fi magazines sold in the high street newsagents. It was not particularly notable, but its hook was an expanded Q&A section where readers could write in for advice. In about 1980, Keith Howard took over as Editor, and soon Hi-Fi Answers had a radical editorial makeover. Word got around that every single question that was posed on their Q&A pages was answered with instructions to purchase a Linn Sondek turntable, Linn speakers, and Naim amplification. It didn’t seem to matter what the question was, the answer was always “Linn/Naim”. Additionally, Hi-Fi equipment was now reviewed solely by listening to it, with not a single measurement playing any role in the evaluation process. It really was quite a radical departure, back in those days, to talk about how an **amplifier** sounded! Let alone a turntable, or a tonearm. Finally, they propounded a radical new philosophy of “source first”, where the most important component in a Hi-Fi system was the turntable, followed, in order, by the tonearm, cartridge, preamp, power amp, and loudspeakers. All this was almost a total inversion of the perceived wisdom of the day. This radical approach interested a young me, as I had by that time gone through many stages of incremental system upgrades. Each time the system indubitably sounded better after the upgrade, but after the new car smell wore off I was left with the uneasy feeling that nothing much of substance had actually changed. I could hear apparently higher fidelity, but the new system never really rocked my boat any more than than did the previous incarnation. Meanwhile, Hi-Fi Answers promised audio nirvana if only I would buy a Linn Sondek. It was time I found out what all the fuss was about.
I found myself in London one weekday afternoon, so I figured I could spend some time in one of the city’s Linn dealers and I wouldn’t be getting in anybody’s way. I can’t remember its name, but I had read that Hi-Fi Answers’ Jimmy Hughes, the leading light of the new “just listen to it” school of equipment reviewing, used to work there. One of the sales staff duly introduced himself and inquired what I was looking for. I explained. He installed me in one of their many private (single-speaker) listening rooms and spent about two hours giving me a one-on-one lesson in listening to Hi-Fi. It went like this. I happened to have a couple of albums that I had just bought. One of them was a Sibelius Violin Concerto, although I don’t remember who the violinist was. He started off by asking why I had bought that record. This was an immediate problem, since I had only bought it because it was in a clearance bin and seemed worth a punt. But, surrounded by equipment I could never afford, and a smoothly urbane salesman I didn’t want to offend, I really didn’t want to say that. So I offered some appropriate-sounding platitudes. The salesman wouldn’t give up on it, though – he wanted to play it for me on a popular low-end turntable, and we duly listened for a while. At the end, he interrogated me on my thoughts regarding the soloist’s performance. Bless him, the salesman listened patiently to my clearly clueless response. I had no real opinion regarding the soloist’s performance, and I’m sure the salesman knew it. Now we switched the record to a Linn Sondek turntable, fitted with a Linn Ittok arm and a Linn Asak cartridge. I was asked to listen again, and answer the same question.
During those first 10 minutes of exposure to the Linn, I got it. It was a jaw-dropping experience. All of a sudden, everything made made sense. Like being struck by Cupid’s arrow, I immediately knew that the “source first” concept was the real deal, and that the Linn was for me. The salesman took me through many more albums, each one carefully chosen to illustrate a point he wanted to make. We listened to Sondeks with different arms and cartridges. Each point he wanted to make was a lesson I needed to absorb.
What I learned in that store that afternoon has been the basis of my approach to Hi-Fi ever since, and I don’t feel it has ever let me down. And I have no intention of trying to set it out in print here, because words alone can’t and don’t fully capture it. Only the experience does. Only the experience can, and preferably with the assistance of a really good teacher. But if I could distill the essence of it, it would be this: Does the performance **communicate** with you? The value of music cannot lie solely within the notes and words, but must derive from the performers’ interpretation of them. Sure, it takes technical chops to perform the piece, but what makes it worth listening to in the first place should be the same as what makes it worth committing to tape in a studio in the first place. The performer must surely have something to say, so is he **communicating** that to you as you listen?
I ended up confessing to the salesman that I could not remotely afford a Linn Sondek, and he was cool with that. But I did start saving, and in a little over a year I bought a Rega Planar 3 turntable, and a little over a year after that, replaced it with a Linn Sondek. My journey, which had begun about eight years earlier, only now started to make real forward progress. It was shortly after taking possession of the Sondek that Gordon J. King gave me the Denon Receiver. And it was after I gave it back to him that I wrangled a Naim Preamp and a 250 on long-term loan. I finally had that “Linn/Naim” system. Eventually, the Linn and Naim were both replaced, but now each upgrade came with a concomitant and lasting improvement in the pleasure to be had from the system.
Back then, the Hi-Fi world was different to what it is now. There were a very small number of manufacturers offering equipment with truly high-end performance, and a large majority whose products fell seriously, seriously short. It was a market in which the “Linn/Naim” message could – and did – resonate. Today, the picture is very different. You have to go a long way to find a truly bad product, and the choice of seriously, seriously good equipment can be almost bewildering. You know, as I write this, it occurs to me that maybe life was indeed much simpler when all you needed to know was “Linn/Naim”, “Linn/Naim”, and “Linn/Naim”. Nostalgia ain’t what it used to be.
When I was a kid, growing up in a rough area of Glasgow, we were all taught music at school – even at elementary school. I have a memory going back to about age eight, sitting in a classroom that was right next to the school gym. I recall it containing gym equipment. And I recall the teacher writing two very strange words on the blackboard – “Beethoven” and “Mozart”. Frankly, I don’t remember much else about it. I do know that we were taught the so-called “Tonic Solfa”, – Do, Re, Mi, Fa, So, La, Ti, Do, which is in musical parlance the major scale. On a piano keyboard this is easily played as C, D, E, F, G, A, B, C. I think it is sad that this sort of thing is no longer taught in most schools as part of the core syllabus.
I think we also all know that those notes I mentioned form only the white keys on the piano keyboard, and that there are also black keys that sit between them, set back slightly from the front of the keyboard. Every pair of white notes has a black note between them, save for E/F and B/C. This gives the piano keyboard its characteristic pattern of black keys, which alternate up and down the keyboard in groups of two and three. It is this breakup of the symmetry that allows us to immediately identify which note is which. For instance, the C is the white key immediately to the left of the group of two black keys. The other thing most of us know is that every black note has two names – the black note between C and D can be called either C-sharp (written C#) or D-flat (written D?). And if you didn’t know that before, well you do now!
Any performing musician will tell you that it is critically important to get your instruments in tune before you start playing. And if you are in a band, it is important that all instruments are in tune with each other. Some instruments (most notably stringed instruments) have a propensity to go out of tune easily and need frequent tune-ups, some even during the course of a performance. Even the very slightest detuning will affect how the performance sounds. Let’s take a close look at what this tuning is all about, and in the process we will learn some very interesting things.
Something else that I think you all understand is that the pitch of a note is determined by its frequency. The higher the frequency, the higher the note. And as we play the scale from C to the next C above it (I could denote those notes as C0 and C1 respectively), we find that the frequency of C1 is precisely double the frequency of C0. In fact, each time we double the frequency of any note, what we get is the same note an octave higher. This means, mathematically, that the individual notes appear to be linearly spaced on a logarithmic scale. If we arbitrarily assign a frequency to a specific note by way of a standard (the musical world now defines the frequency 400Hz as being the note A), we can therefore attempt to define the musical scale by defining each of the adjacent 12 notes on the scale (7 white notes and 5 black notes) as having frequencies which are separated by a ratio given by the 12th root of 2. If you don’t understand that, or can’t follow it, don’t worry – it is not mission-critical here. What I have described is called the “Even-Tempered Scale”. With this tuning, any piece can be played in any key and will sound absolutely the same, apart from the shift in pitch. Sounds sensible, no?
As I mentioned earlier, if you double the frequency of a note you get the same note an octave higher. If you triple it, you get the note which is the musical interval of “one fifth” above that. In other words, if doubling the frequency of A0 gives us A1, then tripling it gives is E1. By the same logic, we can halve the frequency of E1 and get E0. So, multiplying a frequency by one-and-a-half times, we get the note which is a musical fifth above it. Qualitatively, the interval of one-fifth plays very harmoniously on the ear, so it makes great sense to use this simple frequency relationship to provide an absolute definition for these notes. So now we can have A0=400Hz and E0=600Hz.
The fourth harmonic of 400Hz is another A at 1600kHz, so let’s look at the fifth harmonic. This gives us the musical interval of “one third” above the fourth harmonic. This turns out to be the note C#2. So we can halve that frequency to get C#1, and halve it again to get C#0. The notes A, C#, and E together make the triad chord of A-Major, which is very harmonious on the ear, so we could use this relationship to additionally define C#0=500Hz.
We have established that we go up in pitch by an interval of one-fifth each time we multiply the frequency by one-and-a-half times. Bear with me now – this is what makes it interesting. Starting with A0 we can keep doing this, dividing the answer by two where necessary to bring the resultant tone down into the range of pitches between A0 and A1. If we keep on doing this, it turns out we can map out every last note between A0 and A1. The first fifth gives us the note E. The next one B. Then F#. Then C#. Let’s pause here and do the math. This calculation ends up defining C# as 506.25Hz. However, we previously worked out, by calculating the fifth harmonic, that C# should be 500Hz! Why is there a discrepancy? In fact, the discrepancy only gets worse. Once we extend this analysis all the way until we re-define A, instead of getting 400Hz again we end up with 405.46Hz. And what about the “Equal-Tempered Scale” I mentioned earlier – where does that fit in? That calculation defines a frequency for C# of 503.97Hz.
The problem lies in the definition of the interval of one-fifth. On one hand we have a qualitative definition that we get by observing that a note will play very harmoniously with another note that has a frequency exactly one-and-one half times higher. On the other, we have a more elaborate structural definition that says we can divide an octave into twelve equally-spaced tones, assign each tone with the names A through G, plus some black notes (sharps/flats), and define one-fifth as the interval between any seven adjacent tones. I have just shown that that the two are mathematically incompatible. Our structural approach gives us a structure where we can play any tune, in any key, and defines an “Equal-Tempered” scale, but our harmonic-based approach is based on specific intervals that “sound” better. How do we solve this conundrum?
This was a question faced by the early masters of keyboard-based instruments, where each individual note can be precisely tuned at will to a degree of precision that was not previously attainable by other instruments. All this took place in the early part of the 18th Century, back in the time of our old friend Johann Sebastian Bach. It turns out they were very attuned to this issue (no pun intended). The problem was, if you tuned a keyboard to the “Equal-Tempered” tuning, then pieces of real music played on it did not sound at all satisfactory. So if the “Equal-Tempered” tunings sounded wrong, what basis could you use to establish something better? There isn’t a simple answer for that. Every alternative will, by pure definition, have the property that a piece played in one key will sound slightly different played in another key. What you want is that the different keys have the property of each having a sound which we accept may be different in character, but such that none of them sound “bad” in the way that the “Equal-Tempered” tuning does.
This problem shares many aspects with the debate between advocates of tube vs solid-state amplifiers, of horn-loaded vs conventionally dispersive loudspeaker, even of digital vs analog. If the solution is to be found in a consensus opinion of a qualitative nature, there is always going to be a divergence of opinion at some point. In Bach’s time, there was a consensus which emerged in favour of what is termed “Well-Tempered” tuning. I won’t go into the specifics regarding how that particular tuning is derived, but in short this is now the basis of all modern Western music. Bach wrote a well-known collection of keyboard pieces titled “The Well-Tempered Klavier” whose function is to illustrate the different tonal character of the different musical keys which arise from this tuning.
One thing which emerges as a result of all this is that the tonal palette of a composition is determined, to a certain degree, by the key in which it is written. This is what is behind the habit of classical composers to name and identify their major works by the key in which they are written. You may have wondered why Beethoven’s ninth symphony was written in D-Minor, or, given that it had to have been written in some key, why the key always gets a mention. If so, well now you know.
Here is a web site that explores the “character” of each of the different keys. Of course, since this is a purely qualitative assessment, YMMV. Enjoy!…
Here is something that a lot of people either don’t know or didn’t realize. How is silence encoded in DSD?
You may remember reading my post a while back called Two’s Complement. This is where I introduced the concept of the Signed Integer. We allow the most significant bit of a N-bit number to encode whether the number is positive or negative, and the remaining N-1 bits encode the magnitude. Positive numbers go from zero upwards, and negative numbers from -1 downwards. The key point here is that we have an unambiguous representation of zero.
But DSD has only one bit. If we assign that bit as the sign, then there are no bits left to assign the magnitude. Instead a value of “1” represents the highest possible voltage (nominally +1V) and a value of “0” represents the lowest possible voltage (nominally -1V). Zero volts is smack dab in between.
You recall that a convenient way to view DSD is to think of the signal being the average value of a sequence of consecutive bits in the DSD bitstream. Therefore, anything can be used to represent zero, so long as it averages out to have the same number of ones as zeros. And that’s more or less right, but we have to understand what that means in practice, so that, if the need should arise (which it does, as we shall see shortly), we understand how to best represent zero.
The simplest representation sounds like it ought to be 10101010. Since DSD is a 2.8MHz 1-bit data stream, this particular sequence actually encodes a 1.4MHz signal at maximum volume. It seems very bizarre that to encode silence, we have do this by instead encoding the highest possible frequency at the highest possible volume. Bizarre, but true. It works because when this bitstream gets to the DAC, the required low-pass filter will attenuate the 1.4MHz component out of existence. We could also use a sequence like 10110010 which works just as well. In fact it is arguably slightly better because the high frequency content is slightly lower in amplitude, although it is spread out over more frequencies. This is a choice you get to make – whether to encode silence as a high level signal at a frequency furthest away from the audio band, or as a range of lower level signals at frequencies a little closer to the audio band. There is no single right answer.
I said this question comes up as a practical matter, and indeed it does. The specification for the DSD file format chooses to break it up into chunks of about 4kB each, and does not allow for smaller chunks. However, a DSD bitstream can be of arbitrary length, and so if it is to be encoded into the approved file format, it needs some extra signal to be appended to the end of it to bring the last chunk up to its required 4kB size. Obviously, this extra padding needs to be silence. But which specific representation of DSD silence does the DSD file format specification tell us to pad it out with? The answer – quite incredibly – is with zeros. It is quite specific about that. But 00000000 does not encode silence in the DSD world. It encodes full scale negative voltage. In fact, for reasons I won’t elaborate here, it is worse than that – it encodes a negative voltage which is deep into clipping. When you play this back, the result is not silence, but **BANG!!!**. Yes – crazy but true – the specification for the DSD file format calls for every track to be padded out with a digital signal which could propel your tweeter dome across the room!
Do we need to be alarmed? Not really. By now, this problem is well understood, and so playback software such as BitPerfect recognizes these undesirable **BANG!!!** signals and replaces them with something that properly encodes zero. Also, for the most part, I exaggerate for humorous effect, to get my point across. But the reality exists that there are certain speaker designs out there which could be very expensively damaged by “correctly” coded DSD silence which is not properly corrected by the DSD playback software.
Some DSD content producers do the decent thing and prepare DSD files that correct the problem at the source using proper DSD silence, and are therefore, strictly speaking, out of compliance with the specifications. My friend Cookie Marenco of Blue Coast Records is particularly conscientious in this regard. Others continue to produce “BANG-encoded” files in strict adherence to the specs.
You want to know the really sad part about this? The solution is really very simple. All we need is to issue a revision to the DSD file specification which corrects this problem by simply specifying the preferred digital bitstream that should be employed for the purposes of padding silence. It is trivial beyond belief. It requires a 30-second edit to the file spec. Job done. But the people who “own” the file spec have no interest in making this happen. Their position is that it is not a problem because all the player software corrects for it.
These are the people who spent a fortune developing SACD and then botched the launch. Ah well … at least your tweeter dome won’t end up in your coffee mug if you use BitPerfect!
Mahler’s 7th Symphony stands unique among the composer’s symphonic cycle for many, many reasons. Most of all, there remains huge uncertainty over what it is actually about. Does it have an over-arching message, or programme? For the conductor, it presents huge difficulties in determining what it is, musically, that you want your interpretation to say. The magnitude of this uncertainty is not to be underestimated. Indeed, there has been at least one major international conference of musicologists devoted exclusively to analysis and interpretation of this one piece.
What did Mahler think about it? The composer was known to be very particular about his compositions, and was an acknowledged master of complex musical form. Each of his symphonies has a clearly discernible span, making a journey from one place to another, or examining a set of complex emotions or feelings with great clarity. Analysts have long pondered over the Symphony’s 5-movement structure and tried to tie in the meanings of the outer movements in relation to the inner three. You would have thought Mahler himself would have recognized such weaknesses, and yet he expressed himself more satisfied with the 7th than with any of his other symphonies. He obviously saw something different in it.
Mahler undertook work on the 7th immediately after finishing his 6th Symphony, a relentlessly somber and anguished composition. Yet none of these tragic elements make their way into the 7th Symphony. It is clearly its own piece, born of its own musical ideas. He began by composing what would become the 2nd and 4th movements, both called “Nachtmusik” – hence giving the Symphony its commonly used sobriquet “Song of the Night”. Between those two is the Scherzo, another sinister-sounding movement of evidently nocturnal character. What ties these three central movements together? The answer to this must surely be the key to unlocking the mystery of the whole symphony. Let’s look at them more closely.
The second movement is a beautifully crafted evocation of a quiet night in the forest. We hear small animals scurrying about, calling to each other. Hints left and right of things that might be happening if only we could see. There is a kind of musical darkness that is evocative without being revealing, if I might put it that way. It is almost a pastoral setting in total darkness. Yet this darkness is one without any sense of menace. Like Beethoven’s 6th Symphony’s stroll through the countryside on a fine summer’s day, this movement is a stroll through the forest in the middle of the night. Humans have a natural trepidation when faced with darkness and night, and this movement seems to want to illustrate that it needn’t be so. It is uplifting music.
But then along comes the Scherzo. Now our community of nightlife is scurrying about with an obvious sense of nervousness, with an unspoken threat of something dangerous lurking unseen and probably very close by. The Scherzo is unsettled from beginning to end. Even as calm tries to break out from time to time, it is a nervous calm, and never seems to entirely free itself from the dangers hiding in the background. But these dangers seem to be content, for the time being, to lurk, and never manage to leap forward and give their fears a name.
The fourth movement is the second “Nachtmusik” movement, and is a different beast entirely. Here the protagonist is taking a leisurely, moonlit, late-evening stroll. The restrained urgency of the forest has gone, along with its menagerie of small furry animals. The feral menace of the Scherzo have evaporated, and instead the charms of the night are assembled to serenade us. We are left with an overwhelming impression of contentment.
These three movements are the core of the Symphony, and were written first of all, with the 1st and 5th movements not being added by Mahler until the following year. I think Mahler had said most of what he wanted to say in these three movements, but realized that they did not stand up on their own as a Symphony without the weighty bookends of suitable opening and closing movements. I think this is what was in his mind when he knocked out the 1st and 5th movements in little over a month in the summer of 1905.
The first movement is one big introduction. It is seen by many analysts are representing daybreak, and indeed it can be readily interpreted – on its own – in that light. But it doesn’t really make a lot of sense to celebrate daybreak before three movements which set about celebrating night. It is my contention that the first movement celebrates not nightfall as such, but – and here there is no word for what I want to say, so I am going to have to make one up – “nightbreak”. We live in a daylight world, and in our world day breaks and night falls. But in a nocturnal world the opposite happens, night breaks and day falls. So the 1st movement of Mahler’s 7th represents “nightbreak” as a dawning process. That it takes its time doing so, is necessary mainly for the purposes of creating an opening movement of suitable weight and depth.
The opposite happens in the finale. Here the night is gradually giving way to day. The dark tonal colours give way to conventionally brighter ones, and the music works its way to a celebratory conclusion. We have been blessed with another wonderful night, which is now drawing to its conclusion with the dawn, and God willing, once the day passes the music’s nocturnal protagonist can hopefully look forward to the next night.
Because of the interpretive difficulties I have mentioned, there are many different and viable performances of this difficult work available on record. I must admit, this has always be a very tough symphony for me, as nobody has yet come up with an interpretation that – to my ears at least – makes real sense of this challenging symphony. I would have said that, with my hand on my heart, I haven’t yet heard a single recording I can recommend.
But that has now changed.
I recently posted about the Michael Tilson Thomas recording with the San Francisco Symphony, which is being made available in DSD by Blue Coast Records at a stunning price (until the end of November). It is a stunning recording too. This is finally the definitive Mahler 7th for me. Those of you who already know the piece can be forgiven for wondering what the hell I am talking about in my out-of-left-field analysis. But I think Tilson Thomas just about nails it for me in this recording. In particular, the middle three movements are spectacularly spot on – quite the best I have yet heard. Only the final movement is arguably weak. The first movement is a great exposition of my “nightbreak” Introduction theory – and has the amusing bonus that the famous Star Trek theme which makes its appearance half way through is voiced to sound just like it does on the TV show! I would expect nothing less from a bunch of San Franciscans! The core central movements are breathtakingly magnificent. A truly captivating performance. Well done, Tilson Thomas, and quite unexpected given his unconvincingly austere rendition of the 1st (albeit a superbly recorded, unconvincingly austere rendition), also available from Blue Coast.
I provided a link to it in the previous post I mentioned, so here instead is a link to a YouTube video of Tilson Thomas and the San Francisco Symphony performing the 7th at a Prom concert in London, England a couple of years back. Nowhere near as polished as the recording, from an interpretative standpoint, but still an hour and a half of compelling viewing.
Most of you who do not make a habit of listening to classical music will have heard of a Symphony, and know that it is some sort of portentous orchestral piece listened to by highbrow types wearing appreciative frowns. But I suspect that a much smaller proportion have some clear idea of what a Symphony actually is, and why it is at all important. If you are interested to learn a little more, this post is for you. But be forewarned – I am not a trained musicologist, so if you like what you read here, don’t treat it as gospel, but rather as inspiration to read further, from more authoritative sources.
The term “Symphony” actually has its roots in words of ancient Greek origin originally used to describe certain musical instruments. They have been applied to pipes, stringed instruments, a primitive hurdy-gurdy, and even drums. By the middle ages, similar words were being used for musical compositions of various forms. It is not until the eighteenth century that composers – most prominently Haydn and Mozart – began using the term Symphony to describe a particular form of orchestral composition that we may find familiar today.
Beginning in the Renaissance, the wealthiest European monarchs and princely classes began to assemble troupes of resident musicians in their courts. Although churches had for centuries maintained elaborate choirs, and travelling troubadours have been mentioned in the historical record since time immemorial, it was really only in this period that the concept of what we would now identify as an orchestra began to take shape. Since orchestras didn’t heretofore exist, it follows that composers of orchestral music also didn’t exist either, and the two had to develop and evolve hand in hand. Court composers composed, as a rule, at their masters’ pleasure. They wrote what they were told to write, rather than what they were inspired to write. The purpose of the “orchestra” was mainly to provide music to dance to, although special pieces were sometimes commissioned from the court composer for ceremonial occasions.
As music and musicianship grew, so the scope of compositions began to grow in order to highlight the advancing skills of the performers. Musical forms began to develop which would showcase these talents, and compositional styles emerged which would enable these performers to express their talents in the form of extended playing pieces where they would elaborate both their own playing skills, and the composer’s evolving compositional ideas. Specialist composers began to emerge, culminating in Johann Sebastian Bach, who would go on to codify many of the compositional and structural building blocks which continue to underpin all western music today. It might surprise many readers to learn that today’s pop & rock music adheres very firmly to the principles first set forth by Bach, far more so than do its modern classical counterparts.
By the late 18th century, specialist composers had fully emerged, brimming – indeed exploding – with musical ideas. Many of those ideas involved utilizing the seemingly unlimited expressive potential of the musical ensemble we call an orchestra, but there were few accepted musical forms which composers could use to realize these ambitions. What emerged was the Symphony. Musical forms did exist for shorter, simpler pieces. What the new classical symphonists did was to establish ways of stitching together groups of smaller pieces to make an interesting new whole, which they called a Symphony.
Haydn and Mozart established that a Symphony could be constructed by taking a simple, but highly structured established form such as a Sonata (think Lennon & McCartney) and combining it first with a slower piece and then with a faster piece by way of contrast, and concluding with an up-tempo musical form (such as a Rondo) which has a propensity to drive towards a satisfying and natural conclusion. Eventually, composers would learn to link the four “movements” together by thematic, harmonic, or tonal elements. In any case, the idea was that the four movements would together express musical ideas that exceeded the sum of their parts.
In the next century, particularly thanks to Beethoven, the Symphony grew to become the ultimate expression of compositional ideas. When a composer designates a work a Symphony, it implies both the deployment of the highest levels of musical sophistication, and great seriousness of purpose. Indeed many composers were (and are) reluctant to apply the term to compositions which in their minds failed to meet their personal expectations of what the form demands.
So what, then does the form demand? As time has gone on, the answer to that has grown increasingly abstract. In my view, what it demands more than anything else is structure, which sounds terribly pompous, so I need to describe what I mean by that. Structure is the framework upon which the music expresses its message. I think the easiest possible way to explain that is to listen to the first movement of Beethoven’s 5th symphony (with Carlos Kleiber conducting the Vienna Philharmonic Orchestra, if you can get hold of it). Everybody knows the famous 4-note motif which open the piece – DA-DA-DA-DAAAAA!, and then repeats one tone lower. The entire first movement is all about Beethoven explaining to us what he means by that 4-note motif. The piece sets about exploring and developing it in different ways. We hear it in different keys, at different pitches, played by different instruments and by the orchestra in unison, at different tempi, as the main theme and as part of the orchestra’s chattering accompaniment. It starts off famously as an interrogatory statement – three notes and then down a third with a portentous dwell on the fourth note. By the end of the movement the motif has modulated into a triumphant phrase – three notes and then up a fourth, with the fourth note punched out like an exclamation point. The opening of the movement has asked a (musical) question, then went on to explore the matter in some detail, and finished with a definitive answer. This is what I mean by structure. By the time the movement is over, I feel I know all I need to know about the 4-note motif, or at at least all that Beethoven has to say about tit.
A symphony can be a mammoth piece – some are over an hour long. Four movements is traditional, but five or six are common. What is needed to make a symphony work is that its musical message must be properly conveyed across its whole. It needs to feel incomplete if any parts are missing. It needs to feel wrong if the movements are played in the wrong order. And above all it needs to give up its mysteries reluctantly; it doesn’t want to be a cheap date – it wants your commitment too. A symphony is all about that structure, how its musical ideas are developed both within the individual movements, and also across the entirety of the work. These musical ideas may not be overt – indeed they can be totally hidden in such a way that experts have never managed to fully uncover them in over a hundred years. It may even be that the composer himself only knows those things in his subconscious. Some symphonies are programmatic – which is to say that the composer himself has acknowledged that it sets about telling a particular story – a fine example is the 7th Symphony of Shostakovich which represents the siege of Leningrad in WWII. Some symphonies express acknowledged thoughts, emotions, and musical recollections evoking a particular subject – such as Mendelssohn’s Italian (No 4) and Scottish (No 3) symphonies and Corigliano’s 1st symphony (prompted by the AIDS epidemic). Many entire symphonic oevres were prompted by profoundly religious (i.e Bruckner) or existential (i.e Mahler) emotions.
You can’t talk about the Symphony without talking about the dreaded “curse of the ninth”. Beethoven wrote nine symphonies then died. Shortly afterwards, Schubert died with his 9 symphonies (one unfinished) in the bag. Then came Dvorak, Bruckner, and Mahler. There are others, including the English composer Ralph Vaughan Williams. Arnold Schoenberg wrote “It seems that the Ninth is a limit. He who wants to go beyond it must pass away … Those who have written a Ninth stood too close to the hereafter.” Some composers went to great lengths to avoid writing a ninth symphony without getting the tenth safely in the bag immediately afterwards. These include Gustav Mahler whose ninth symphony he instead titled “Das Lied Von Der Erde”. With that safely published he wrote his formal 9th symphony … and then expired with his 10th barely begun. Amusing though it might be, the “curse of the ninth” is of course a fallacy, but one which remains acknowledged by many contemporary composers as a superstition in whose eye they really don’t want to poke a stick.
Some great composers wrote little of note outside of their symphonic output. Others never once in long and productive careers turned their hand to the format – Wagner and Verdi spring to mind. There are a few who were strangely reluctant to approach the form – Stravinsky composed four of them, but pointedly refused to assign numbers to them. In any case, the most important aspect of a Symphony is that – with very few exceptions – they reflect the composer’s most sincere, and personally committed works. They are therefore often listed amongst their composer’s most significant, most important works. And they are also among the most performed and recorded.
Here are a list of Symphonies that might go easy on the ear of a new listener interested in exploring the oevre, with some recommended recordings:
Mozart: Symphony No 40 (McKerras, Prague Chamber, Telarc)
Beethoven: Symphony No 5 (Kleiber, Vienna Philharmonic, DG)
Brahms: Symphony No 4 (Kleiber, Vienna Philharmonic, DG)
Dvorak: Symphony No 8 (Kertesz, LSO, Decca)
Tchaikovsky: Symphony No 6 (Haitink, Royal Concertgebouw, Philips)
And a few that might challenge the already initiated:
Nielsen: Symphony No 5 (Davis, LSO, LSO Live!)
Mahler: Symphony No 7 (Tilson Thomas, SF Symphony, Blue Coast)
Vaughan Williams: Symphony No 5 (Boult, London Philharmonic, EMI)
Corigliano: Symphony No 1 (Barenboim, Chicago Symphony, Erato)
Shostakovich: Symphony No 7 (Haitink, London Philharmonic, Decca)
California-based Blue Coast Records is a pioneering producer of downloadable DSD recordings. Cookie insists that all her recordings are 100% analog to DSD encodings, with no intermediate PCM conversions in any form. This is quite important, because it means that all mixing, panning, fading, etc has to be done entirely in the analog domain since the DSD format does not enable this to be done in the digital domain.
Blue Coast’s DSD offerings are mostly recorded in her own studio, using a methodology she refers to as “ESE” (Extended Sound Environment). These are some of the finest recordings you will ever own. Cookie also sells a very limited selection of recordings from other studios whose work meets her exacting requirements. A long-standing personal history with Sony means that she is now able to offer a selection of Mahler symphonies recorded by The San Francisco Symphony, conducted by Michael Tilson Thomas. At the moment, symphonies 1, 2, 4 and 7 are offered, and it is to be hoped that this will be expanded in due course to the whole cycle.
Typically, these specialist recordings are very, very expensive. We’re looking at $50 – $75 here. But for the month of November, Cookie is making Mahler’s 7th Symphony available for ONLY $12. That’s right, a stunning, no-compromise DSD download of a Mahler Symphony that typically comes on a double CD, for just 12 bucks. Your choice of original DSD or PCM (24/96 or 16/44). Please hurry to buy this before they rush her off in a straight-jacket for some recuperation time in a local “health spa”.
At BitPerfect we love our Mahler. We worship it. Thank you Cookie!
Either way you look at them, the high-end loudspeakers produced by Wilson Audio have a certain unmistakable ‘house style’ aesthetic. They have a well-known ‘house sound’ too, and it may float your boat or it may not, but in any case it appears to this observer that Chez Wilson, form follows function. And now, to boot, form can follow function in any colour you like! As to price – well, if you have to ask, you can’t afford it!
I have spent time with Wilson’s Sophia III and with their Sasha W/P models. But I want to talk about their higher-end models, the Alexia and Alexandra XLF. These have the midrange drivers and tweeters in a separate box which is mounted above the bass bin inside a frame which allows them to be tilted through a quite surprising range of settings, the idea being, as I understand it, to allow for very precise time alignment depending on where the listener is located. As a rule, the bigger the speaker, the greater the physical separation between the drive units, and, therefore, the greater is the potential benefit to be had by getting the temporal alignment just so. At least, that’s the theory.
Tim spent some time observing Peter McGrath setting up a pair of Alexias. This involves positioning them in the room in the usual way, and then aligning the upper bins. The way the design works, as you might expect, this is very easy to do. The surprising thing was, however, the effect of getting the time alignment right. Wilsons are well known for, among other things, their holographic imaging properties. What Tim heard was how incredibly the image just seems to snap into place when you get the alignment right. It took Peter McGrath just 10 minutes to do the whole job, but there again he knows what he is doing! Interestingly enough, the image snapped into place not just for the lucky person in the sweet spot, but for quite a range of other listening positions too. Tim says they are comfortably the best speakers he has ever heard – and this from a guy who owns Stax SR-009’s.
Recently, I spent some time refining the set-up of my own speakers. My B&W 802 Diamonds are not quite in the Wilson league for imaging, but they are still pretty good. However my listening room’s dimensions are unkind, and every now and then, having pondered long and hard over what problem I should be trying to solve, I try my hand at some room treatment work. Its a never ending process. In this case, I built a massive absorbing panel, about 6’ x 4’, and located it on the ceiling above the speakers, towards the back of the room. When you do stuff like this, it throws your previously optimized speaker set-up out of whack, and you have to start all over again.
I ended up moving my speakers a little more than 4 inches closer together, but that is typical of the sort of positioning accuracy you need to be bearing in mind. I had got the tonal balance where I wanted it, and the imaging was sort of correct. Instruments and performers were all where they should have been, but the ‘holographic’ element was missing – you could locate the position of instruments reasonably well, but somehow you could not just shut your eyes and visualize the performer. Trying to get this right, there are a couple of recordings I like to go to. These are inevitably recordings I played through the Wilson Sophia III’s and which, as I result, I had a good idea of what I ought to have been hearing imaging-wise. And I wasn’t hearing it.
I remembered what Tim said about the Alexias, and how Peter solved that problem by the simple expedient of tilting the mid/tweeter unit forward in its frame. My 802’s don’t have that adjustment. But then I thought why not just try tilting the whole kit & caboodle forward? I did. Nothing happened. So I tilted them a bit more. Still nada. By that time I had run out of adjustment range on the 802’s very beefy threaded spikes. So I found some wood to prop up the rear spikes and tilted them as far forward as I dared (802’s are deceptively heavy). Well, that did the trick. All of a sudden the soundstage deepened and widened, and individual instruments began to occupy a more definable space. In particular, vocalists now appear tightly located, centre stage, just behind the plane of the speakers, and just in front of the drum kit. Kunzel’s 1812 cannons are amazingly precisely located. Job done!
The rear spikes now sit in cups on a pair of Black Dahlia mounts, and everything is pretty solid. With the tilt, I found I needed to position them a couple of inches further back, but that’s fine – nobody can get behind them now (have you noticed how people always seem to be irresistibly drawn to the rears of large loudspeakers?) and accidentally topple them forwards. See the photograph below for an indication of the degree of tilt.
I’m not sure quite why this tilting has the effect it has. The design of the 802’s is such that the vertical and horizontal dispersion are probably very similar, outside of the crossover region at any rate. Perhaps I am reducing the energy reflected off the ceiling, but that is speculation, and well outside my sphere of competence. In any case tilting is surely a tool we can all add to our room-tuning arsenal. It will certainly be a big part of mine for some time to come. At least until I can afford Alexias …
We learned over the last couple of days how DSD works as a format, and what its basic parameters are. It is a 1-bit system, sampled at 2.82MHz, relying heavily on oversampling and Noise Shaping. We didn’t say much about the actual mechanism of Noise Shaping because, frankly, it relies on some pretty dense mathematics. So we didn’t say too much about what the resultant data stream actually represents.
We learned that each individual bit is somehow like an Opinion Poll, where instead of asking the bit to tell us what the signal value it is, we ask it whether it thinks it should be a one or a zero. The bit is like an individual respondent – it doesn’t really know, but it has a black & white opinion, which might be right or wrong. But by asking the question of enough bits, we can average out the responses and come up with a consensus value. So each individual bit does not represent the actual value of the signal, but on the other hand an average of all the bits in the vicinity gets pretty close! So, at any point in time, in order to represent the signal, some of the bits are ones and some are zeros, and, to a first approximation, it does not matter too much how those ones and zeros are distributed. But here is a quick peek into Noise Shaping. Noise Shaping works by taking advantage of the choices in distributing the ones and zeros. It is precisely those choices that give rise to the Noise Shaping.
An interesting way of looking at it is that the signal itself represents the probability that the value of the bit will be a one or a zero. If the probability is higher, a higher proportion of the bits will be ones, and if it is lower the proportion will be correspondingly lower. As the waveform oscillates between high and low, so the relative preponderance of ones over zeros in the bitstream oscillates between high and low. The value of any one individual bit – whether it is a one or a zero – says very, very little about the underlying signal. That is quite a remarkable property. An individual bit could be subject to some sort of reading error and come out totally wrong, and provided there are a small enough number of such errors, it is arguable that you would never actually know that the error happened!
Compare this with PCM. In a PCM signal, we can argue that every single bit means something. It says something highly specific about the value of the signal at some specific point in time. Some bits say more important things than others. For example, the Most Significant Bit (MSB) tells us whether the signal is positive or negative. If there is a reading error and that comes out wrong, the impact on the resultant signal can be massive. Because every bit in a PCM system has a specific meaning, and every bit in a DSD system has a nebulous meaning, it should be no surprise that there is no mathematical one-to-one correspondence between PCM data and DSD data. Sure, you can convert PCM to DSD, and vice versa, but there is no mathematical identity that links the two – unlike a signal and its Fourier Transform, each of which is a direct representation of the other in a different form. Any transformation from one to the other is therefore subject to a lossy algorithm. Of course, an appropriate choice of algorithm can minimize the loss, but the twain are fundamentally incompatible.
However, let us look at some similarities. Let’s look at the function of the DAC. For a PCM DAC, its job is to recreate the values of a voltage encoded by the data at each sample point. Those voltages go up and down according to the data in the PCM data stream. We just need to pass that waveform through a low-pass filter and the result is music. Now let’s compare that with DSD. For a DSD DAC, its job is to recreate the values of a voltage encoded by the data at each sample point. Those voltages go up and down according to the data in the DSD data stream. We just need to pass that waveform through a low-pass filter and the result is music. Hang on one minute … wasn’t that just the same thing? Yes it was. For 16/44.1 (CD) audio, the PCM DAC is tasked with creating an output voltage with 16-bit precision, 44,100 times a second. On the other hand, for DSD the DSD DAC is tasked with creating an output voltage with 1-bit precision, 2,822,400 times a second. In each case the final result is obtained by passing the output waveform through a low-pass filter.
That is an interesting observation. Although the data encoded by PCM and DSD are fundamentally different – we just got through with describing how they mean fundamentally different things – now we hear that the process for converting both to analog is exactly the same? Yes. Strange but true. From a functionality perspective, as far as a DAC is concerned, DSD and PCM are the same thing!
By the way, I have mentioned how we can add Noise Shaped dither to a PCM signal and in doing so encode data below the resolution limit of the LSB. Our notional view of PCM is that the data stream explicitly encodes the value of the waveform at a sequence of instants in time, and yet, if we have encoded sub-dynamic data, that data cannot be encoded in that manner. Instead, by Noise Shaping, it is somehow captured in the way the multi-bit data stream evolves over time. Rather like DSD, you might say! There is definitely a grey area when it comes to calling one thing PCM and another thing DSD.
We started off this series of posts by mentioning the different ‘flavours’ of DSD that are cropping up out there. Now that I have set the table, I can finally return to that.
DSD in its 1-bit 2.28MHz form is the only form that can be described correctly (and pedantically) as DSD. We saw how it represents the lowest sample rate at which a 1-bit system could be Noise Shaped to deliver a combination of dynamic range and frequency response which at least equalled that delivered by CD. What it in fact delivers is a significant improvement in dynamic range, and more of a loosening in the restrictions on high-frequency response imposed by CD than a major extension of it. In any case, that is enough for most listeners to come out in favour of its significant superiority. However, a significant body of opinion holds that by increasing the sample rate yet further, we can achieve a valuable extension of the high-frequency response. (In principle, we could also increase the dynamic range, but DSD is already capable of exceeding the dynamic range of real-world music signals). People are already experimenting with doubling, quadrupling, and even octupling 1-bit sample rates. Terminology for these variants is settling on DSD128, DSD256, and DSD512 respectively (with actual DSD being referred to as DSD64). Why do this? Partially because we can. But – early days yet – reports are emerging of listeners who are declaring them to be significantly superior.
There are additionally formats – mostly proprietary ones which only exist ephemerally within DAC chips or pro-audio workstations – which replace the 1-bit quantization with multi-bit quantization. These have occasionally been referred to as “DSD-Wide”. I won’t go into that in much detail, but there are some interesting reasons you might want to use multi-bit quantizers. Some established authorities in digital audio – most notably Stanley Lipschitz of the University of Waterloo – have come out against DSD largely because of its 1-bit quantizers. Lipschitz’ most significant objection is a valid one. In order to create a DSD (in its broadest sense) bitstream, a Sigma Delta Modulator is used. For these modulators to achieve the required level of audio performance, they must incorporate high-order modulators to perform the Noise Shaping. These high order modulators turn out to be unstable if you use a 1-bit quantizer, but can be made stable by adopting a multi-bit quantizer. In practical terms, though, many of Lipschitz’ objections have been addressed in real-world systems, so I won’t pursue that topic further.
But ever since SACD (which uses the DSD system) first came out, DSD DACs have recognized that the DAC’s performance can be significantly improved by using one of the “extended-DSD” formats. So, internally, the majority of such chipsets convert the incoming DSD to their choice of “extended-DSD” format, and do the actual DAC work there. The conversion involves first passing the DSD bitstream through a low-pass filter, with the result being a PCM data stream using an ultra-high resolution floating-point data format sampled at 2.82MHz. This is then instantly oversampled to the required sample rate and converted to the “extended-DSD” format using a digital SDM. Unfortunately, the low-pass filter needs to share some of the undesirable characteristics of the brick-wall filters that characterize all PCM formats because of all the high-frequency content that has been shaped into the ultrasonic region. So it is likely that the proponents of DSD128, DSD256, and so forth, are onto something if those formats can be converted directly in the DAC without any “extended-DSD” reformatting.
I hope you found these posts which take a peek under the hood of DSD to be informative and interesting. Although the mathematics of PCM can be challenging at times, those of DSD are that and more, in spades. It is likely that progress in this field will continue to be made. In the meantime, condensing it into a form suitable for digestion by the layman remains a challenge of its own 🙂