44.1 vs 48 vs 88.2 vs 96

@seablade: my guess as to why it’s 48kHz would be that film is usually 24 fps, which would make things nicely dividable.

As for 44.1 vs 48 my take is that even if might be practically inaudible, the accuracy of the resulting waveform when mixing multiple tracks together should make 48kHz the right choice.
I don’t think the downsampling makes more inaccurate than the result of rounding errors from multiple 44.1kHz tracks mixed together. After all libsamplerate is supposed to be very accurate.

And didn’t your math teacher always tell you not to round until you had calculates the result?

That said, nowadays, with all the bitcrunching in mastering and the kids listening to it through crappy PC speakers, you could probably get away with 22kHz/8bit …

@Seb: I would suggest to resample the impulse responses to 44100 kHz and then work with 44100 in the whole project instead of just using 48kHz to be compatible to the 48kHz impulses.
In the first case you have to resample your project, in the 2nd case you are just resampling your impulse file.

Just my opinion
Erdie

One of the advantages of 48 kHz (and especially 88.2 or 96) is that it reduces distortion and audible aliasing. To understand this, an explanation of the Nyquist frequency is necessary. The Nyquist frequency is the highest frequency that can be represented by a particular sample rate. Since the minimum number of samples required to represent a single frequency is two (one positive and one negative turn) a 20 kHz frequency would need at least a 40 kHz sample rate. We use sample rates such as 44.1 kHz because its corresponding Nyquist frequency is 22.05 kHz, roughly the upper threshold of human hearing (at least during early childhood).

So if 44.1 kHz completely covers the range of hearing, what’s the problem? As a waveform approaches the Nyquist frequency, it is represented by fewer and fewer samples. Graphically, it begins to resemble a staircase, and at the Nyquist frequency itself (remember, only two samples) it is a pure square wave. If you’ve heard a square wave before you know how harsh and distorted it sounds.

Also, frequencies above the Nyquist frequency cannot be correctly represented (because the sample rate is not fast enough), so you end up with incorrect samples mixed in there that don’t belong, creating phantom noise known as aliasing. Typically, low-pass filters are used to eliminate data above the Nyquist frequency before they are sampled but this is a gradual roll-off, not a brick wall filter.

So even though a 44.1 kHz sample rate includes all the frequencies we can hear, the quality of the upper end of that range progressively degrades. This is the range where the sizzling upper harmonics of instruments such as cymbals and muted trumpets reside. Just ask any audiophile about the high-end response of vinyl versus compact disc. Most of us have been exposed to enough rock concerts and power tools to have permanently numbed that part of our hearing, so we don’t really notice it. But when you consider that distortion is fed into each DSP effect you use and included in every calculation when channels are mixed, it gets distorted further and processed into all sorts of unexpected artifacts, thus amplifying the problem. By using a 48 kHz sample rate, we’re shifting all of that chaos up and out of the range of human hearing. That’s why it’s so difficult to discern the difference between 44.1 and 48. What you’re listening for is the absence of artifacts that occur near the Nyquist frequency, not the inclusion of higher frequencies that a typical human can’t even hear.

If you start with 48 and stick with it throughout the tracking/mixing/mastering processes, you will avoid the unwanted artifacts inherent to digital recording. But if you’re eventually downsampling it to 44.1 kHz for CD, does that negate all the measures you’ve taken to preserve the quality up to this point? I’ve heard one of my professors propose this argument but I will unashamedly admit I don’t hear it, and I don’t know anyone else who can. Ultimately, any degradation is probably more due to the quality of the anti-aliasing filter used prior to downsampling rather than the uneven relation between the two sample rates.

Then again, I’ve got a bit of tinnitus due to a rather unsportsmanlike paintball gun maneuver, so bear in mind my advice tends to be more theoretical than practical.

Thanks duffrecords. I found a few videos on YouTube to illustrate this for the visually minded.

Short:

Long:
http://www.youtube.com/watch?v=VF2DHsJmf7s

From what I understand so far, you need exactly twice the sample rate of a frequency to be able to reproduce the square wave of that frequency, assuming that the samples are taken at peak and trough. If the sample is taken when the wave form intersects at 0, then the recorded wave from is flat. If the samples are taken in between peak and 0, then the wave form is recorded as a square wave with less amplitude. If you sample slightly above or below twice the sample rate of the frequency, then you get a horrible modulation added to the recording. This is often the case because we do not record perfect high frequencies in relation to the sample rate. In order to prevent aliasing artifacts from appearing in our recordings, we should probably sample at 5 times the highest frequency that we are recording for a lo-fi recording and much higher for a hi-fi recording.

@matt_fedora: I think there are a few misunderstandings here…

The sampling theorem - which is what we are really talking about here states that:

A band-limited analogue signal that has been sampled can be perfectly reconstructed from an infinite series of samples if the sampling rate EXCEEDS 2B samples per second where B is the highest frequency component of the original signal.

So, when you talk about (re-)constructing a square wave at precisely 1/2 Fs where Fs is the sample rate, you are correct that it cannot be done, but that is because a square wave of frequency 1/2Fs has significant energy at frequencies far in excess of its fundamental frequency (that’s what makes it a square wave and not a sine wave…) and thus violates the sampling theorem in this case.

In a completely band-limited system, an anti-alias filter at the ADC prevents significant energy at frequencies at or above 1/2Fs from entering the system. So, obviously if a square wave close to 1/2Fs is presented to such a system then it won’t look much like a square wave when its re-constructed, but as previously stated this is because such a wave only has that form because of the other harmonics. However, I would be surprised if most people can distinguish the difference between a square and a sine wave at such extreme frequencies (or even hear them at all) so the debate about whether the sample rate should be high enough to capture such extreme frequencies is largely irrelevant.

Problems can occur if a system creates signals with significant energy at or above 1/2Fs as part of its signal processing, as when these are reconstructed (in the DAC), they will violate the sampling theorem, and these frequency components will manifest as aliases, which will fold down below 1/2Fs and become audible. Simple DSP processing such as filtering or level changes will not normally cause this, but any type of wave-shaping (or clipping) almost certainly will, which is why most correctly designed DSP code will do this at a higher sample rate internally, and then decimate to the correct Fs using an appropriate filter before the output, thereby removing the aliases.

Some bad information here. A sampled sine wave does not degrade to a square waves as it approaches the Nyquist frequency. You are ignoring the whole process by which audio is reconstructed from sampled data. Although it looks counter intuitive even with only two data points per cycle a perfectly good sine wave can be reconstructed, right up to the Nyquist frequency.

Good information here: http://www.lavryengineering.com/documents/Sampling_Theory.pdf

@projectMalamute: I don’t know if you are refering to my post or to parts of this thread in general, but I would agree that others have posted incorrect information, however, what I was trying to say was NOT that a sampled sinewave degrades to a square-wave, it does not, I was trying to point out that you can, as the sampling theorem states, perfectly reconstruct a band-limited signal from an infinite series of samples if the sampling rate EXCEEDS 2B samples per second where B is the highest frequency component of the original signal.
A square-wave of fundamental frequency 1/2Fs - where Fs is the sampling frequency contains frequency components with significant energy at frequencies far in excess of 1/2Fs which is why a digital system will have trouble reproducing it correctly, however a pure sine-wave can be properly re-constructed, provided it has a frequency below 1/2Fs

@linuxdsp: It was not your post I was referring to. It was this:

"So if 44.1 kHz completely covers the range of hearing, what's the problem? As a waveform approaches the Nyquist frequency, it is represented by fewer and fewer samples. Graphically, it begins to resemble a staircase, and at the Nyquist frequency itself (remember, only two samples) it is a pure square wave. If you've heard a square wave before you know how harsh and distorted it sounds."
Which is a common and understandable misconception. It is not intuitively obvious that a sine wave can be reconstructed from so little data, but it really can. The Lavry paper I linked to is good reading on the subject. A sine wave at 21K sampled at 44.1k is absolutely reconstructed as a sine wave, not a square wave.

Another point: moving from 44.1 to 88.2 or from 48 to 96 gives you an octave of extra bandwidth. One could argue that this allows for a less artifact prone anti-aliasing filter and thus cleans up the top octave. Moving from 44.1 to 48 has no such advantage. This only extends your bandwidth a little over a semitone.

@linuxdsp: It was not your post I was referring to. This forum software does not seem to allow one to quote another's message. It was this:

Try using the BLOCKQUOTE tags as listed below the text entry;)

  Seablade
projectMalamute : A sampled sine wave does not degrade to a square waves as it approaches the Nyquist frequency.
You got it the wrong way. A square wave is constructed from a sine with added (higher frequency) harmonics. So, by the theorem, the reconstructed analog signal of a a sampled square wave at 20kHz will be more or less exactly the same as a sine, since all the harmonics that make up the square haven't been sampled in the first place.

No, I really don’t.

By the theorem a square wave sampled close to the Nyquist frequency would be reconstructed as a sine wave and a whole ton of inharmonically related garbage due to aliasing.

On the other hand, a sine wave sampled near the Nyquist frequency can be reconstructed as a sine wave, not as some sort of staircase or square wave as was being claimed.

Ah, Mea Culpa.
I missed the “not” in the quoted sentence :slight_smile:

I think there is a difference between 88.2k and 44.1. It has to do with subharmonic distortion. When 2 different notes are played like a perfect 4rth. The harmony produces an additional lower note that is quite audible. The note produced is a 5th below the lower of the 2 original notes. I would think that this would create the need for headroom for high frequency harmonic content so the proper subharmonics can be produced which would better reflect real sound. These subharmonics could produce a whole other series of natural artifacts. Bob Ludwig insisted that mastering should be done at 24 bit, 96k. The music that I record is not that critical and I’m not that fussy so I record at 44k, 24 bit, but I bet a conductor or a real audiophile can hear the difference. I think I can hear more punch and solid imaging at higher rates but I am quite deaf to be honest from playing electric guitar every weekend. I noticed that some interfaces like to work at certain rates. Some gear will work at unsupported rates like my cheesy Creative xfi. It doesn’t support 44.1 for recording but it does it anyways with sputters, dropouts where as my Apogee runs smooth as silk at any rate that I throw at it.
A western tuned 3rd produces the 4rth below the root so you end up with a chord. i think some cool stuff happens up there at 24k or so that effects 16k for example.
thanks

@soybalm: subharmonics are not an issue related to sample rate. as has been mentioned previously in this thread, the chief difference with going to higher sample rates is the nature of the brickwall filter that is used to prevent aliasing distortion (where higher frequencies get wrapped around to appear in the bottom of the frequency range). the higher the sample rate, the “better” the brickwall filter can be which means less aliasing which means less distortion of the original sound.

as for claims that people can hear the difference: there still (to my knowledge) are no double blind studies that show this, only a lot of noise from people who have never done a double blind study.

With a Nyquist frequency of just above 20KHz x 2, a 20KHz sine wave can theoretically be reproduced in its entirely from as little as two samples. Any other 20KHz waveform cannot (well, possibly a square wave) - but that’s beside the point. At these high frequencies the human ear can only hear sine waves anyway. To be able to decipher a square wave or triangle wave etc at those frequencies requires that the human ear would need to be able to hear the relevant harmonics (in other words, the ear would need to be able to hear frequencies well in excess of 20KHz - which it can’t). In audio, the Nyquist reconstruction of waveforms close to the cutoff frequency explicitly relies on the fact that the original signals can’t possibly contain harmonic content which is capable of being heard.

But there’s a catch… when the sampling frequency is only just above 2 x the highest required frequency, practical reconstruction (i.e. decoding) of high frequency sine waves needs to rely on the fact that filters “ring” if you pulse them at frequencies close to their cutoff frequency. Thus, the quality of the reconstructed signal is very heavily dependent on the quality of the filter design. At a higher sampling frequency, filters can be used within the part of their spectrum that doesn’t rely on ringing (but instead, can rely on having a larger number of samples from which to reconstruct the waveform). Increasing the sample frequency can thus reduce stress on the decoding filer which should theoretically improve high frequency linearity. It’s quite possible that this might (might) be audible to a few people.

It’s very unlikely though that 48KHz gives any better audible performance that 44.1KHz, given the same replay hardware but I can at least provide the answer to this point:-

From Seablade:

48kHz was used for syncing to film, I can’t find a good article explaining exactly why and am not sure I could explain it very well from memory at the moment,

48KHz is mostly favoured because it offers improvements for film and video frame rates. 240KHz (48KHz x 5) divides to a very high degree of accuracy by 24, 25, 30 and 29.97.

At least from the perspective of the converters it shouldn't in theory— but there is a practical situation where it does: A lot of consumer audio hardware (esp cheap sound cards) can only operate at at 48KHz, I presume due to cheapness and the fact that the two rates aren't related by a small integer so it's difficult to derive them from a single oscillator. To play 44.1k material these devices will resample, usually with a very low quality interpolation (see— previously mentioned cheapness). Running these devices at their native rate can greatly improve the output quality.

This isn’t quite true. Although the final conclusion is roughly correct, these devices do not resample in hardware. Perhaps it makes no functional difference to users that its the device drivers that take care of it, not the hardware, but it does have the important consequence that the quality of the resampling is malleable. If you were using the right software, for example, it would never deliver material at 44.1kHz to a 48kHz device, but would have resampled it already, perhaps using a very high quality algorithm to do so. On the other hand, it is true that the quality of the sample rate conversion found in the basic device driver framework on Linux (ALSA) is not very good. But then again, its entirely possible to use different SRC if desired, and bypass the default stuff.

A lot of consumer audio hardware (esp cheap sound cards) can only operate at at 48KHz, I presume due to cheapness and the fact that the two rates aren't related by a small integer so it's difficult to derive them from a single oscillator. To play 44.1k material these devices will resample, usually with a very low quality interpolation (see— previously mentioned cheapness). Running these devices at their native rate can greatly improve the output quality.

Yes, this has crossed my mind few times. I currently use an M-Audio Fast Track Pro and the specs are:

Digital Audio Interface Specifications > 48kHz sampling rate unless otherwise stated

http://www.m-audio.com/products/en_us/FastTrackPro.html

So I should get better response from it running it at 48 KHz which would be its native sample rate right? Not forcing it or stressing it to resample the audio during tracking, mixing.

EDIT: hmmmm, post I replied to has been deleted?

There’s another reason for using higher sampling rates for digital processing.

Any process that introduces non-linearities creates harmonics. Deliberate distortion generators are extreme cases, but more common examples are dynamic range processors: compressors, expanders, limiters. While the gain is being changed, sidebands are added to the signal, and a rapid gain change may add sideband material that is above Fs/2. In the digital domain (or strictly speaking for any sampled system), those products are aliased back into frequencies below Fs/2, and when Fs is 44.1kHz you only need harmonics above 24 kHz and the alias products have a good chance of baing audible.

The worst example is clipping, which if it happens in the digital domain can result in strong and very unpleasant sounding and harmonically unrelated alias products that wouldn’t be there if the same clipping was done in the analogue domain.

Use of high sample rates moves these alias products to higher frequencies and would need much more severe distortion with higher frequency harmonic energy to produce audible aliases.

This is also the reason why analogue inputs should be clipped in the analogue domain before hitting the antialising filters - so if clipping does occur, the out-of-band harmonics are removed by the antialiasing filter. Not that there’s any excuse for clipping a 24 bit input stage, of course :slight_smile:

Eq and reverb operating normally don’t generate the harmonics that create this trouble; it’s mostly dynamic range processors.

And if I do want to track @ 44.1 KHz but this interface works internally @ 48 KHz then I should enable dithering in jackd, is that correct?

@joegiampaoli: Dithering is a method normally used to compensate for the inaccuracies caused when converting to a different resolution (bit depth).