Monthly Archives: May 2013

If you have already installed iTunes 11.0.3 then you have two options.

Option 1 is to use BitPerfect in “Minimize iTunes” mode. We have tried that here for a couple of hours and it seems to work.

Option 2 is to uninstall iTunes and re-install iTunes 11.0.2 – here are some instructions:
http://osxdaily.com/2012/02/06/delete-itunes-mac-os-x/
You will have to download iTunes 11.0.2 from here:
http://mac.filehorse.com/download-itunes/

After re-installing iTunes 11.0.2 it may show an error message reading one of iTunes’ “.itl” files.  Ideally, you will be able to replace this file with an older version that you have previously backed up using Time Machine.  Failing that, the solution is to re-import your music after telling iTunes not to organize and not to copy files (don’t forget to re-set the organize/copy settings back to what they were afterwards).

If you have already installed iTunes 11.0.3 then you have two options.

Option 1 is to use BitPerfect in “Minimize iTunes” mode. We have tried that here for a couple of hours and it seems to work.

Option 2 is to uninstall iTunes and re-install iTunes 11.0.2 – here are some instructions:
http://osxdaily.com/2012/02/06/delete-itunes-mac-os-x/
You will have to download iTunes 11.0.2 from here:
http://mac.filehorse.com/download-itunes/

After re-installing iTunes 11.0.2 it may show an error message reading one of iTunes’ “.itl” files.  Ideally, you will be able to replace this file with an older version that you have previously backed up using Time Machine.  Failing that, the solution is to re-import your music after telling iTunes not to organize and not to copy files (don’t forget to re-set the organize/copy settings back to what they were afterwards).

DO NOT INSTALL iTunes 11.0.3 yet!  I just installed it and it does not appear to function correctly with BitPerfect.  I need to do more testing to confirm.  I will post an update ASAP.

DO NOT INSTALL iTunes 11.0.3 yet!  I just installed it and it does not appear to function correctly with BitPerfect.  I need to do more testing to confirm.  I will post an update ASAP.

Clipping is a term that originated with analog audio, and refers to the situation where the magnitude of the signal rises to a level that is larger than the ability of the medium to store it, or the electronics to deliver it.  For example, with magnetic tape, the signal is stored by embedding a magnetic signal onto the tape.  But if the musical signal gets beyond a certain level, the tape will not have the magnetic capacity to store a large enough magnetic signal.  Same thing with an amplifier – if you amplify a signal enough, you will eventually run out of voltage (or current) at the output.

With magnetic tape, as well as – generally speaking – with good old-fashioned vacuum tube amplifiers, when the signal level approaches and exceeds the maximum the system was designed to handle, the musical peaks get gradually compressed, so gradually in fact that for the most part you don’t notice it happening.  This so-called “soft clipping” meant that, for the most part, clipping was not the most crucial sonically degrading issue faced by early audio designers.

This all changed with the advent of solid-state electronics.  Your typical transistor amplifier does not soft-clip.  It hard-clips.  This means that when it tries to deliver an output voltage larger than it the maximum it was designed for, the output voltage just sits at the maximum value and stays there until the output signal drops below that maximum value.  The peak of the signal is just wiped out, and the signal waveform develops a flat-topped appearance everywhere this hard-clip occurs.  Imagine Shaquille O’Neill walking through your front door, and instead of gracefully ducking to avoid bumping his head, the door simply chops his head off.  The effect on the music is similarly messy.

In digital audio, the effect of clipping can actually be even worse!  Lets look at what happens when a signal is clipped.  The easiest way to do that is to consider the clipping as being an error signal which is added to the music signal.  This error signal comprises nothing but the peaks that got chopped off.  If we analyze this signal, we find that it has frequency components which extend from within the audio bandwidth (which is considered to be about 16Hz – 20,000Hz) on up into frequency ranges above the audio bandwidth.  In analog space, we can generally just ignore any components above the audio bandwidth because we can’t hear them anyway.  But in digital audio we can’t do that.

Typical digital audio has a sampling frequency of 44,100Hz, the standard developed for the Compact Disc.  There is a firm and fixed mathematical law that says if we want to sample a waveform at a certain frequency, then we have to make sure that the waveform contains no frequencies above exactly one half of the sampling frequency.  This frequency is termed the “Nyquist” frequency.  For CD, that means it has to have no content at any frequency above 22,050Hz.  What happens if you try and encode a signal at, say, “N” Hz ABOVE the Nyquist frequency?  What you find is that the result you get is EXACTLY THE SAME as you would have got if instead the signal was “N” Hz BELOW the Nyquist frequency.  When you play back this signal, it is not the original high frequencies you will hear, but the “bogus” lower ones.  This effect is called mirroring, and is a very audibly destructive artifact.  It explains why the original analog signal has to be very tightly filtered prior to being sampled, to eliminate all traces of any frequency components above the Nyquist frequency.

Back to clipping.  If you take a perfectly good signal in the digital domain, and perform some signal processing on it, then the possibility generally exists that the resultant signal will contain peaks that are above the maximum value that can be represented by the digital encoding system.  What do you do with those peaks?  The easiest thing is to “clip” them at the digital maximum, so that just as with analog clipping in a solid-state amplifier, each sample that works out to be above the digital maximum is encoded as a digital maximum.  You will have, in effect, encoded a waveform containing frequency components above the Nyquist frequency.  When you play back that signal, those otherwise inaudible components will be recreated as audible components at corresponding frequencies below the Nyquist frequency.  This will sound even worse than hard-clipping in an amplifier.

The solution is to use mathematics to “re-shape” the portion of the signal that is being driven into clipping, in such a way as to remove all of the unwanted high-frequency components.  Of course, there will be a sonic price to pay, even for this.  But once you have driven the signal into overload in the first place, there is no escaping without some sort of penalty.

This sort of situation arises in general with any form of signal processing, but “mirroring” is most commonly encountered when down-sampling from a higher sample rate to a lower one, particularly one derived from a DSD source which has (by design) a lot of high-frequency noise.  In general, you have to assume that the higher-rate-sampled “source” data can contain frequency components anywhere below its own Nyquist frequency.  But some of those frequencies can still be higher than the Nyquist frequency of the lower sample rate which is the “target” of the conversion.  So, unless you absolutely know for a certainty that the “source” material contains no frequency content above the Nyquist frequency of the “target”, then your downsampling process needs to incorporate an appropriately designed low-pass digital filter.

Clipping is a term that originated with analog audio, and refers to the situation where the magnitude of the signal rises to a level that is larger than the ability of the medium to store it, or the electronics to deliver it.  For example, with magnetic tape, the signal is stored by embedding a magnetic signal onto the tape.  But if the musical signal gets beyond a certain level, the tape will not have the magnetic capacity to store a large enough magnetic signal.  Same thing with an amplifier – if you amplify a signal enough, you will eventually run out of voltage (or current) at the output.

With magnetic tape, as well as – generally speaking – with good old-fashioned vacuum tube amplifiers, when the signal level approaches and exceeds the maximum the system was designed to handle, the musical peaks get gradually compressed, so gradually in fact that for the most part you don’t notice it happening.  This so-called “soft clipping” meant that, for the most part, clipping was not the most crucial sonically degrading issue faced by early audio designers.

This all changed with the advent of solid-state electronics.  Your typical transistor amplifier does not soft-clip.  It hard-clips.  This means that when it tries to deliver an output voltage larger than it the maximum it was designed for, the output voltage just sits at the maximum value and stays there until the output signal drops below that maximum value.  The peak of the signal is just wiped out, and the signal waveform develops a flat-topped appearance everywhere this hard-clip occurs.  Imagine Shaquille O’Neill walking through your front door, and instead of gracefully ducking to avoid bumping his head, the door simply chops his head off.  The effect on the music is similarly messy.

In digital audio, the effect of clipping can actually be even worse!  Lets look at what happens when a signal is clipped.  The easiest way to do that is to consider the clipping as being an error signal which is added to the music signal.  This error signal comprises nothing but the peaks that got chopped off.  If we analyze this signal, we find that it has frequency components which extend from within the audio bandwidth (which is considered to be about 16Hz – 20,000Hz) on up into frequency ranges above the audio bandwidth.  In analog space, we can generally just ignore any components above the audio bandwidth because we can’t hear them anyway.  But in digital audio we can’t do that.

Typical digital audio has a sampling frequency of 44,100Hz, the standard developed for the Compact Disc.  There is a firm and fixed mathematical law that says if we want to sample a waveform at a certain frequency, then we have to make sure that the waveform contains no frequencies above exactly one half of the sampling frequency.  This frequency is termed the “Nyquist” frequency.  For CD, that means it has to have no content at any frequency above 22,050Hz.  What happens if you try and encode a signal at, say, “N” Hz ABOVE the Nyquist frequency?  What you find is that the result you get is EXACTLY THE SAME as you would have got if instead the signal was “N” Hz BELOW the Nyquist frequency.  When you play back this signal, it is not the original high frequencies you will hear, but the “bogus” lower ones.  This effect is called mirroring, and is a very audibly destructive artifact.  It explains why the original analog signal has to be very tightly filtered prior to being sampled, to eliminate all traces of any frequency components above the Nyquist frequency.

Back to clipping.  If you take a perfectly good signal in the digital domain, and perform some signal processing on it, then the possibility generally exists that the resultant signal will contain peaks that are above the maximum value that can be represented by the digital encoding system.  What do you do with those peaks?  The easiest thing is to “clip” them at the digital maximum, so that just as with analog clipping in a solid-state amplifier, each sample that works out to be above the digital maximum is encoded as a digital maximum.  You will have, in effect, encoded a waveform containing frequency components above the Nyquist frequency.  When you play back that signal, those otherwise inaudible components will be recreated as audible components at corresponding frequencies below the Nyquist frequency.  This will sound even worse than hard-clipping in an amplifier.

The solution is to use mathematics to “re-shape” the portion of the signal that is being driven into clipping, in such a way as to remove all of the unwanted high-frequency components.  Of course, there will be a sonic price to pay, even for this.  But once you have driven the signal into overload in the first place, there is no escaping without some sort of penalty.

This sort of situation arises in general with any form of signal processing, but “mirroring” is most commonly encountered when down-sampling from a higher sample rate to a lower one, particularly one derived from a DSD source which has (by design) a lot of high-frequency noise.  In general, you have to assume that the higher-rate-sampled “source” data can contain frequency components anywhere below its own Nyquist frequency.  But some of those frequencies can still be higher than the Nyquist frequency of the lower sample rate which is the “target” of the conversion.  So, unless you absolutely know for a certainty that the “source” material contains no frequency content above the Nyquist frequency of the “target”, then your downsampling process needs to incorporate an appropriately designed low-pass digital filter.

Most of us understand that PCM audio data “samples” (measures) the music signal many times a second (44,100 times a second for a CD) and stores the result in a number.  For a CD this number is a 16-bit number.  A 16-bit number can take on whole-number values anywhere between 0 and 65,535.  Whole-number values means it can take on the values such as 27,995 and 13,288.  But it cannot take on values such as 1.316 or 377½.  Whilst this works just fine, recall that the music waveforms that we are trying to measure swing from positive to negative, and not from zero to a positive number.  But it turns out we can work around that.  You see, an interesting property of binary numbers and the inner workings of computers can be brought to bear.
A 16-bit number is just a string of 16 digits which can be either one or zero.  Here is an example, the number 13,244 expressed in ordinary 16-bit binary form: 0011001110111100.  (I hope I don’t need to explain binary numbers to you.)  If this were all zeros it would represent 0, and if it were all ones, it would represent 65,535.  But there are actually different ways in which to interpret a sequence of 16 binary digits, and one of these is called “Twos Complement.
Before going into that, I want to talk about 15-bit numbers.  A 15-bit number can take on values between 0 and 32,767.  Wouldn’t it be nice if we could encode our music as one 15-bit number representing 0 to +32,767 for all those times when the musical waveform swings positive, and another 15-bit number representing 0 to -32,767 for all those times when the musical waveform swings negative?  In fact, we can do that very easily.  We take a 16-bit number, and reserve one of the bits (say, the most significant bit) to read 0 to represent a positive number and 1 to represent a negative number, and use the remaining 15 bits to say how positive (or negative) it is!  Are you with me so far?
We need to make one small modification.  Both the positive and the negative swings encode the value zero.  We can’t have two different numbers both representing the same value, so we need to fix that.  What we do is we say that the negative waveform swings encode the numbers -1 to -32,768 so that the value zero is only encoded as part of the positive waveform swing.  So now we have a system where we can encode the values from -32,768 to +32,767 which makes us very happy.
Lets do a simple thing.  Take each of our numbers from -32,768 to +32,767 and add 32,768 to them.  We end up with numbers that range from 0 to 65,535.  This is our original 16-bit number!  What we done, in a roundabout way, is to create the “Twos Complement” of our 16-bit number.  The twos complement lets us express 16-bit data in a form that covers both positive and negative values.
It turns out that this makes computers very happy as well, because numbers represented as twos complement respond identically to the arithmetic operations of addition, subtraction, and multiplication.  So we can manipulate them in exactly the same way as we do regular integers.  In fact, twos complement representation is so inherently useful to computers that they use an even more friendly term for them – Signed Integers.
Twos Complement (or Signed Integer) representation is such a huge convenience for computer audio, that most audio processing uses this representation.  Amongst other things, simple signal processing functions like Digital Volume Control are more efficient to code with Signed Integers.
There is one thing to bear in mind, though, and it catches a lot of people out.  Recall that the negative swing encodes a higher maximum number than the positive swing.  Here I am going to shift the discussion from the illustrative example of 16-bit numbers to the more general case of N-bit numbers.  The largest negative swing that can be encoded is 2^(N-1) whereas the largest positive swing that can be encoded is 2^(N-1) – 1.  Where this becomes important is to note that the ratio between the two is not constant, and depends on N, the bit depth.  This comes into play if you are designing a D-to-A Converter with separate DACs for the negative and positive voltage swings.  You need to design it such that the negative and positive sides both reach the same peak output with an input signal of 2^(N-1), while recognizing that the positive side can never see it in practice, since it should only ever receive a maximum signal of 2^(N-1) – 1.  If it ever receives a signal of 2^(N-1) this would indicate an error in its internal processing algorithms.
Similar considerations exist when normalizing the output of a DSP stage (which should properly be in floating point format) for rendering to integer format.  The processed floating point data is typically normalized to ±1.0000 and it would be an error to map this to ±2^(N-1) in Twos Complement integer space, because this would result in clipping of the positive voltage swing at its peak.  Instead it must be mapped to ±2^(N-1) – 1.
Such things make a difference when you operate at the cutting edge of sound quality.

Most of us understand that PCM audio data “samples” (measures) the music signal many times a second (44,100 times a second for a CD) and stores the result in a number.  For a CD this number is a 16-bit number.  A 16-bit number can take on whole-number values anywhere between 0 and 65,535.  Whole-number values means it can take on the values such as 27,995 and 13,288.  But it cannot take on values such as 1.316 or 377½.  Whilst this works just fine, recall that the music waveforms that we are trying to measure swing from positive to negative, and not from zero to a positive number.  But it turns out we can work around that.  You see, an interesting property of binary numbers and the inner workings of computers can be brought to bear.
A 16-bit number is just a string of 16 digits which can be either one or zero.  Here is an example, the number 13,244 expressed in ordinary 16-bit binary form: 0011001110111100.  (I hope I don’t need to explain binary numbers to you.)  If this were all zeros it would represent 0, and if it were all ones, it would represent 65,535.  But there are actually different ways in which to interpret a sequence of 16 binary digits, and one of these is called “Twos Complement.
Before going into that, I want to talk about 15-bit numbers.  A 15-bit number can take on values between 0 and 32,767.  Wouldn’t it be nice if we could encode our music as one 15-bit number representing 0 to +32,767 for all those times when the musical waveform swings positive, and another 15-bit number representing 0 to -32,767 for all those times when the musical waveform swings negative?  In fact, we can do that very easily.  We take a 16-bit number, and reserve one of the bits (say, the most significant bit) to read 0 to represent a positive number and 1 to represent a negative number, and use the remaining 15 bits to say how positive (or negative) it is!  Are you with me so far?
We need to make one small modification.  Both the positive and the negative swings encode the value zero.  We can’t have two different numbers both representing the same value, so we need to fix that.  What we do is we say that the negative waveform swings encode the numbers -1 to -32,768 so that the value zero is only encoded as part of the positive waveform swing.  So now we have a system where we can encode the values from -32,768 to +32,767 which makes us very happy.
Lets do a simple thing.  Take each of our numbers from -32,768 to +32,767 and add 32,768 to them.  We end up with numbers that range from 0 to 65,535.  This is our original 16-bit number!  What we done, in a roundabout way, is to create the “Twos Complement” of our 16-bit number.  The twos complement lets us express 16-bit data in a form that covers both positive and negative values.
It turns out that this makes computers very happy as well, because numbers represented as twos complement respond identically to the arithmetic operations of addition, subtraction, and multiplication.  So we can manipulate them in exactly the same way as we do regular integers.  In fact, twos complement representation is so inherently useful to computers that they use an even more friendly term for them – Signed Integers.
Twos Complement (or Signed Integer) representation is such a huge convenience for computer audio, that most audio processing uses this representation.  Amongst other things, simple signal processing functions like Digital Volume Control are more efficient to code with Signed Integers.
There is one thing to bear in mind, though, and it catches a lot of people out.  Recall that the negative swing encodes a higher maximum number than the positive swing.  Here I am going to shift the discussion from the illustrative example of 16-bit numbers to the more general case of N-bit numbers.  The largest negative swing that can be encoded is 2^(N-1) whereas the largest positive swing that can be encoded is 2^(N-1) – 1.  Where this becomes important is to note that the ratio between the two is not constant, and depends on N, the bit depth.  This comes into play if you are designing a D-to-A Converter with separate DACs for the negative and positive voltage swings.  You need to design it such that the negative and positive sides both reach the same peak output with an input signal of 2^(N-1), while recognizing that the positive side can never see it in practice, since it should only ever receive a maximum signal of 2^(N-1) – 1.  If it ever receives a signal of 2^(N-1) this would indicate an error in its internal processing algorithms.
Similar considerations exist when normalizing the output of a DSP stage (which should properly be in floating point format) for rendering to integer format.  The processed floating point data is typically normalized to ±1.0000 and it would be an error to map this to ±2^(N-1) in Twos Complement integer space, because this would result in clipping of the positive voltage swing at its peak.  Instead it must be mapped to ±2^(N-1) – 1.
Such things make a difference when you operate at the cutting edge of sound quality.

I wrote a little while ago about the relationship between DSD and PCM – how DSD is a specific implementation of SDM (Sigma-Delta Modulation), and how both ADCs and DACs for PCM are built around SDM engines.  I also wrote about the algorithms that convert data between the two formats, and how the conversions are not entirely lossless.

The picture I left you with was that – for all practical purposes – there is no such thing as “Pure” PCM.  Any PCM music data is derived from SDM at some point in its creation, and has therefore undergone at least one conversion sequence.

I want to ramble further on DSD, and whether music stored and replayed in the DSD format can be any more “Pure”.  The problem is the inescapable fact that – until somebody comes up with a truly significant breakthrough – you cannot “edit” music in the SDM domain.  This has huge ramifications for the recording industry, where recording, mixing, and mastering can involve very profound manipulation of the music.  Indeed something as simple as volume control – a fade-out, for example – cannot be done in the SDM format.  And the recording industry routinely employs way more elaborate effects that would make your hair curl (have you noticed how many recording artists have unnaturally curly hair?…).

To my knowledge, there are only two studio-grade recording desks out there capable of producing commercial DSD recordings – Sonoma and Pyramix.  Of the two, Sonoma is the oldest, and least functional.  Sonoma drops the signal out to PCM for fade-in and fade-out, but apart from that offers no sound manipulation capability.  Pyramix is modern and quite progressive, but it does all its mixing in “DXD” which is 24-bit 352.8kHz PCM, so all Pyramix DSD is derived from what are essentially DXD masters.

Is there any such thing as a pure DSD recording?  Well, yes, there is, but you have to restrict yourself to transcriptions of analog tape, where no further audio processing is required.  To be fair, there is a fair amount of archival material out there which could benefit greatly from transcription to DSD for re-release.  But, for new recordings, you would need to record to analog tape and mix on an analog deck if you wanted to create true DSD recordings.  Some boutique studios do follow this approach.

And as far as it goes, that sounds all well and good.  But then I came across an interesting paragraph in a 10-year old technical paper from Philips in Holland (who, together with Sony, were the driving force behind SACD).  Here they talk about the typical “DAC” configuration used in a SACD player, and I was rather surprised to read it.  According to this paper, the low-pass analog filters that are required to convert pure DSD to analog do not possess the impulse response characteristics they consider to be necessary for high-end audio performance.  However, digital low-pass filters are more than up to the task.  Therefore, the first thing the DAC does is to use a digital low-pass filter to convert the DSD to a 2.822MHz multi-bit PCM signal.  This PCM signal is then fed into a SDM to generate a multi-bit (typically between 3 and 5 bits) SDM signal at 5.6MHz or even 11.3MHz.  This multi-bit SDM can finally be passed through a low-pass analog filter without having to sacrifice the impulse response characteristic.

So, who’d-a thunk it?  DSD gets converted to PCM and back again in the DAC of a SACD player!  It would be interesting to find out whether modern DSD DACs utilize a similar approach.  If so, then arguably, as well as there being no such thing as “Pure” PCM, there could be no such thing as “Pure” DSD either!

I wrote a little while ago about the relationship between DSD and PCM – how DSD is a specific implementation of SDM (Sigma-Delta Modulation), and how both ADCs and DACs for PCM are built around SDM engines.  I also wrote about the algorithms that convert data between the two formats, and how the conversions are not entirely lossless.

The picture I left you with was that – for all practical purposes – there is no such thing as “Pure” PCM.  Any PCM music data is derived from SDM at some point in its creation, and has therefore undergone at least one conversion sequence.

I want to ramble further on DSD, and whether music stored and replayed in the DSD format can be any more “Pure”.  The problem is the inescapable fact that – until somebody comes up with a truly significant breakthrough – you cannot “edit” music in the SDM domain.  This has huge ramifications for the recording industry, where recording, mixing, and mastering can involve very profound manipulation of the music.  Indeed something as simple as volume control – a fade-out, for example – cannot be done in the SDM format.  And the recording industry routinely employs way more elaborate effects that would make your hair curl (have you noticed how many recording artists have unnaturally curly hair?…).

To my knowledge, there are only two studio-grade recording desks out there capable of producing commercial DSD recordings – Sonoma and Pyramix.  Of the two, Sonoma is the oldest, and least functional.  Sonoma drops the signal out to PCM for fade-in and fade-out, but apart from that offers no sound manipulation capability.  Pyramix is modern and quite progressive, but it does all its mixing in “DXD” which is 24-bit 352.8kHz PCM, so all Pyramix DSD is derived from what are essentially DXD masters.

Is there any such thing as a pure DSD recording?  Well, yes, there is, but you have to restrict yourself to transcriptions of analog tape, where no further audio processing is required.  To be fair, there is a fair amount of archival material out there which could benefit greatly from transcription to DSD for re-release.  But, for new recordings, you would need to record to analog tape and mix on an analog deck if you wanted to create true DSD recordings.  Some boutique studios do follow this approach.

And as far as it goes, that sounds all well and good.  But then I came across an interesting paragraph in a 10-year old technical paper from Philips in Holland (who, together with Sony, were the driving force behind SACD).  Here they talk about the typical “DAC” configuration used in a SACD player, and I was rather surprised to read it.  According to this paper, the low-pass analog filters that are required to convert pure DSD to analog do not possess the impulse response characteristics they consider to be necessary for high-end audio performance.  However, digital low-pass filters are more than up to the task.  Therefore, the first thing the DAC does is to use a digital low-pass filter to convert the DSD to a 2.822MHz multi-bit PCM signal.  This PCM signal is then fed into a SDM to generate a multi-bit (typically between 3 and 5 bits) SDM signal at 5.6MHz or even 11.3MHz.  This multi-bit SDM can finally be passed through a low-pass analog filter without having to sacrifice the impulse response characteristic.

So, who’d-a thunk it?  DSD gets converted to PCM and back again in the DAC of a SACD player!  It would be interesting to find out whether modern DSD DACs utilize a similar approach.  If so, then arguably, as well as there being no such thing as “Pure” PCM, there could be no such thing as “Pure” DSD either!