Calculating RMS in digital audio

@lherg, You said:

My initial question was: where does this factor of 2 come from in the calculation of the RMS value on the dB FS scale.

I am no expert, but I don’t think the relative decibel scale position matters for the use of the factor of 2 with ‘power’ samples. The critical point is that LOUDNESS is proportional to ELECTRICAL POWER is proportional to VOLTAGE * VOLTAGE, which is the AMPLITUDE * AMPLITUDE.

Bels and Decibels are on a logarithmic scale and relate to relative POWER as POWER A divided by POWER B but input into a log base 10 function to make the relative POWER logarithmic relative POWER.

POWER is not easy to measure, but AMPLITUDE that is VOLTAGE is easy to measure. The relative VOLTAGES squared, divided against each other, and converted to a logarithmic scale is the same as POWER and LOUDNESS in Bels or Decibels because the ELECTRICAL RESISTANCE (in Ohms) cancels out when the division of one POWER squared by another POWER squared.

The extra multiplication of the squared AMPLITUDE or VOLTAGE is to make the VOLTAGE squared before the division before the log function. But with the sampling of VOLTAGES over time, the VOLTAGE sample is both squared and multiplied by 2. That’s weird in my non-expert opinion.

The sampling of (relative?) POWER over time (like area under the curve in Calculus) requires the sampling of VOLTAGE ** 2 (with constant factor per RESISTANCE not specified, but decibels are relative to some reference POWER value of the same resistance and that should cancel out with any decibel value).

The RMS VOLTAGE is the VOLTAGE of average POWER. the RMS VOLTAGE **2 / RESISTANCE = AVERAGE POWER.

If you draw a sine wave and also draw a horizontal line of VOLTAGE = AVERAGE VOLTAGE, the area under the curve from any 0-VOLTAGE intersect to any 0-VOLTAGE intersect (that horizontal line of zero volts in the middle of the sine wave and the area under the line of AVERAGE VOLTAGE are the same.

If you draw a modified wave that is the square of the sine wave, you have a power wave. I think that if you draw a horizontal line at RMS VOLTAGE * RMS VOLTAGE, the areas would be equal because the area under the curve represents POWER and LOUDNESS.

The area under a curve is part of the Fundamental Theorem of Calculus. The idea of approximating the area under (or over) a smooth curve with vertical slivers of the same width is not difficult. Make the number of slivers infinite, and the approximation goes to the exact value (if, I suppose, everything is ‘well defined’).

After that, you need to understand what curve. Use the POWER curve not the VOLTAGE curve.

I don’t see a factor of 2 in the integral expression. The factor of 2 is for the Bel calculation between to unsquared VOLTAGES because it is logarithmic.

I am not convinced the times 2 is needed, but what do I know?

for (unsigned int i = 0; i < rms_buffer_size; ++i) {
           sum+= 2.0*rms_buffer[i]*rms_buffer[i];
}
rms = sqrt(sum/rms_buffer_size);
rms_db = 20*log10(rms);

Why not just multiply the final sum by two. The computation is repetitive. How do you calculate RMS decibels without a reference VOLTAGE in the denominator? Apparently, it’s the value 1 volt or at least 1 (suggests full scale to me going by what Robin said next). If all several open source calculations include the factor of 2, then I don’t know where the misunderstanding is, but I know the factor of 2 does not fit my understanding of the theory of integral calculus for relative electrical POWER, and the webpage “Calculating the power of a signal”, linked above, is consistent with my understanding. I think it’s spurious, but I could be wrong.

1 Like

Key is dBFS - decibel relative to Full Scale.

Digital audio signal level is represented as floating point value in the range [-1 … +1] where an absolute value of 1.0 is the max possible amplitude (values above that will be clipped).

This signal level corresponds to Voltage.

Yep, and since the rms_sum value is always positive it can be even taken out of the sqrt() or the log().

Assuming constant impedance:

Power ratio = 10 log (P1/P2) = 10 log (V1/V2)^2 = 20 * log (V1/V2)

@Robin, just to be clear for everyone, I think you mean V1^2/V2^2 and not the square of the log. This makes no sense to me:

Yep, and since the rms_sum value is always positive it can be even taken out of the sqrt() or the log()

RMS is the procedure in reverse order: (1) sum, (2) divide by the count, (3) take the square root. Sample values are summed first and can’t be taken out. The final RMS voltage can me manipulated, I suppose. I am just expressing my understanding not giving a definitive opinion. I don’t have one and don’t wish to do the research and study to get one. Maybe my comments are useful, but they come with no warranty.

1 Like

indeed. I should have added a bracket 10 log ( (V1/V2)^2 )

(though log has a lower precedence and square of the log would be notated log^2 (…))

Ah. I needed the baby steps, Robin: (X * X)/(Y * Y) = (X/Y)*(X/Y).

No, the audio measurement standard AES-17 is explicit that db FS is defined for amplitude, and is defined as 20log(signal_rms/full_scale_sine_rms).

Any measurement given in dB is a measurement of the signal relative to a reference value. The reference value for full scale in this case is a sine wave which just reaches full scale. As lherg pointed out the RMS value of a sine wave is 1/sqrt(2). Dividing by 1/sqrt(2) is of course the same as multiplying by sqrt(2), so you can re-write as 20log(signal_rms*sqrt2).

The factor of 2 pointed out in the first post is just algebraic simplification of where the “divide by 1/sqrt(w))” factor is used in the calculations.

Now for the practical part.

One advantage of moving this factor of 2 inside the log as sqrt(2) is that the (sum/rms_buffer_size) will also have a range of [0…1]. That value that can be easily transmitted reliably (no -inf for silence) .

Furthermore a common function coefficient_to_db(v) → 20 log(v) van be used throughout the codebase.

you guys are saying exactly the same thing :slight_smile:

Yes, but I find the derivation of the 10log(power) vs 20log(amplitude) obscures the point that dB values always have a reference, and in this case the reference is explicitly the RMS of the amplitude of a full scale sine, so you need to divide the signal value by the reference sine value and take the log of that.

One of us has this backwards :slight_smile:

From a physics point of view, the reference is signal power ratio, and only by applying ohms law, you can derive the log of squared voltage ratios (if R is constant). This way you actually calculate the RMS of a sine wave to be 1/sqrt(2).

And only then can the AES come along and tell you to put sqrt(2) in there to normalize it.

In other cases however it’s not as simple. e.g. for LUFS calculation you have to actually integrate and cannot just move a constant normalization factor into the log(). In LUFS loudness DSP you’ll find 10 log (…) for that reason (page 3 https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-5-202311-I!!PDF-E.pdf).

All that being said, for simple RMS, I agree that from an engineering point of view the AES definition is easier to grasp for casual people implementing it.

I will stoke the flames of confusion.

for (unsigned int i = 0; i < rms_buffer_size; ++i) {
           sum+= 2.0*rms_buffer[i]*rms_buffer[i];
}
rms = sqrt(sum/rms_buffer_size);
rms_db = 20*log10(rms);

So let’s consider a mathematical transformation (if I did it right):

for (unsigned int i = 0; i < rms_buffer_size; ++i) {
           true_sum+= rms_buffer[i]*rms_buffer[i];
}
rms = sqrt(2) * sqrt(true_sum/rms_buffer_size);
rms_db = 20*log10(rms);

What is the sqrt(2) for? I am not familiar with AES-17, but even if I were, does it make clear what the sqrt(2) is for? As Robin said, I believe, it is a factor one can apply TO A SINE WAVE’s maximum amplitude to get the RMS amplitude.

Now look at the algorithm. Are we sampling a sine wave? No, we are not. If we have a sine wave, we use exact math in the style of integration of calculus. Audio signals are much more irregular. I don’t think one may ‘normalize’ or convert from maximum amplitude to RMS amplitude with the constant sqrt(2) when one is sampling some generic audio signal.

Others may be a by-the-book person if they wish. I’ve read enough books and don’t read academic stuff any more than necessary. The question as I understand it was one of understanding the why of how it works or would work correctly, not in how to follow an algorithm given from on high. I realize all too well I am the oddball that way. I am not going to assess AES anything, so of course my opinion is just some questions that I think are relevant to a solid understanding of the original question. Good luck to everyone implementing audio signal processing.

But now I realize something. This from Chris is interesting:

No, the audio measurement standard AES-17 is explicit that db FS is defined for amplitude, and is defined as 20log(signal_rms/full_scale_sine_rms).

What if we let full_scale_sine_rms = 1 / sqrt(2) ?
Then we get 20log(sqrt(2)*signal_rms) or:

rms = sqrt(2) * sqrt(true_sum/rms_buffer_size);
rms_db = 20*log10(rms);

READ THIS, lherg:
It would seem the repetitive ‘2’ is the unclear forward reference of the reference RMS amplitude of a maximum sized sine wave.

A value in dB is not absolute, it is the logarithm of the ratio of a measured value to a reference value.
In the case of FS measurement, the “FS” referes to full-scale digital, and dB FS is defined to be the logarithm of the RMS signal value to the RMS value of a sine wave which has a maximum amplitude of the maximum digital value (the maximum numeric value will vary depending on how many bits the audio samples contain).

Not quite. When you calculate the RMS value of a sine wave with peak amplitude of +1 and -1 the value is 1/sqrt(2).
That is the reference value, so when you calculate the ratio of the measured signal RMS value to the reference sine wave RMS value you calculate signal_rms/1/sqrt(2).
When you normalize that fraction you would write it as signal_rms*sqrt(2).

You are not converting max amplitude to RMS of the audio signal, you are calculating the ratio of the signal RMS value to the RMS value of a maximum amplitude sine wave.

When you calculate dB values it is always a comparison to a reference value. dB FS happens to use the RMS amplitude of a full scale sine wave as the reference, dBm uses 1mWatt as the reference, dBu uses the voltage which dissipates 1mW into 600 Ohms as the reference, dB SPL is the sound pressure level referenced to 20 micro pascals. There always has to be a reference value against which you are measuring.

Well, yes, that is explicitly what I wrote in my previous comment.

2^1/2 is indeed a magical number of the RMS gods. RMS is an energy field created by all living digital audio sources. It surrounds us and penetrates us; it binds the DAW galaxy together. Those who can channel that energy are luminous beings, not crude matter such as myself. Its high priest knows the holy specs and vouchsafes to provide many corrective words with bureaucratic authority. His shoe’s latchet I am not worthy to unloose any more than my little mind can deliberate upon the magic of 2^1/2. Socrates was a devil! I can only revere the grand ordering principles of the galaxy that passeth all my understanding. Who am I to suppose causality by reason and context?

See my comment very early on in the thread:

What you are discussing are ways to get an approximation of RMS, when you can’t really calculate RMS of a non repetitive waveform easily. As I mentioned before, RMS is a bit of a BS term in terms of complex waveforms, and is used as a marketing term for some audio manufacturers as well (Was used for speaker ratings for a long time for instance), thankfully there is a bit of a move away from this in larger format sound systems as there were to many unknown assumptions in how such a number was calculated, but I still see it pop up.

    Seablade

Thanks for your comments, I removed the factor of two from my loop to reduce algorithmic complexity. Why carry out several multiplications when just one is enough :slight_smile:

  for (unsigned int i = 0; i < rms_buffer_size; ++i) {
    sum+= rms_buffer[i]*rms_buffer[i];
  }
  rms_dB_FS = 20*log10(sqrt(sum/rms_buffer_size) * sqrt(2.0));

Seablad, these rms calculation algorithms are however used in most audio software.

Even if this does not represent a physical reality, they are very practical and inexpensive in CPU to have an overview of the acoustic power of the signal (The ears are quadratic sensors). Doing an FFT, which is done in LUFS, unless I am mistaken, is certainly interesting in the mastering phase but to monitor multiple inputs it does not seem necessary to me and above all heavy in terms of calculation. The display of peaks and and rms value (Even with an imperfect calculation) is enough for me to see if the signal requires compression, if I need to increase the gain of my preamps…

All that remains are tools that must be understood in order to master them well. In analog either the meters are not perfect, if you have a DC component these are filtered by the capacitors and you therefore do not see them in your display. Sometimes you have to use your ears to mix!!

Correct my point was that in some cases there may not be a reason that can be well defined as much as that is ‘most reflective’ of what is expected.

Seablade

lherg wrote:

Even if this does not represent a physical reality, they are very practical and inexpensive in CPU to have an overview of the acoustic power of the signal (The ears are quadratic sensors).

This sort of social ‘justification’ is ridiculous, like the social oneupmanship. I’ll understand my way because that is what works for me. I realize I am the outlier everywhere I go. Don’t care. I’m right for me. I cannot and am not trying to relate to the intellectuals here who, in my opinion, can’t even recognize much less articulate first principles. The first principles of this thread should be those needed to answer the original question given the ostensible function of this forum.

Here’s a first principle. The digital audio signal to be analyzed does not come with a known function f(t) or f(x) or whatever like the referential sin(t) or sin(x) does. Hence, there is no way to apply exact mathematical integral calculus. What we are doing when we sample and compute off of the digital marks that is the digital data is the approximation of an exact integral calculation that is the basic theoretical idea of integral calculus to derive the area under a curve. There is no f(x), but there are points on the otherwise unknown curve at regular intervals. The approximation is not nonsense or digital audio processing would not be amazing when done skillfully. The approximation at 192 kHz is damned great. It is integral calculus in essence. I doubt your human ears would recognize the difference from some ‘perfect’ function value we can’t determine and what we normally calculate. If close enough were not an engineering principal, a first principal, then there would not be engineering at all.

Furthermore, if someone who does that crazy sort of algorithmic manipulation with the early factor of two would just comment the code properly on the difficult stuff, this whole discussion would not be necessary. I don’t think most people know the important choices and the important details for competency from the other ones. It’s always lookie what I can do. Of Course, Robin is a very clear and insightful writer. Nevertheless, I did not understand anyone to have explained the answer. You can point to your stuff. It did not make sense to me.

First principles, people. If social rank is your first principles, go work at Boeing. There is no need to justify approximate empirical integral calculus, which is fundamental to the derivation of mathematical and exact integral calculus. Furthermore, I know what a man has between his legs, and I don’t give a damn about conventional wisdom such as exists in what I do not recognize as my heritage. I suggest to those capable, know thyself. First principles, Clarice. First principles. (Youtube video " First Principles / Simplicity by Dr Hannibal Lecter to Clarice / The Silence of the Lambs (1991)").

Is this a reference to Harry Potter?

The answer has been provided many times in this discussion thread: It is in the definition of dB FS by the AES: Align the 0 dB FS with the RMS value of a sine wave of maximum amplitude.

https://en.wikipedia.org/wiki/DBFS
The unit dB FS or dBFS is defined in AES Standard AES17-1998,[13] IEC 61606,[14] and ITU-T Recs. P.381[15] and P.382,[16] such that the RMS value of a full-scale sine wave is designated 0 dB FS. This means a full-scale square wave would have an RMS value of +3 dB FS.[17][18] This convention is used in Wolfson[19] and Cirrus Logic[20] digital microphone specs, etc.

That is a misuse of the term. The dB FS measurement is specifically defined as an RMS measurement.
AES 17-2020 section 3.12.2 and 3.12.3 have this note:
“NOTE 2 Levels reported in FS are always rms. It is invalid to use FS for non-rms levels.”

That is somewhat like saying you do not believe addition and multiplication work on a non-repeating set of numbers. RMS is just a way to calculate a particular kind of mathematical average (quadratic mean, as the Wiki page you linked points out), the math doesn’t care if the numbers are periodic or not.

In fact the wikipedia page that you linked shows both the continuous time and discrete time formulas for calculating RMS, and it is pretty clear that there is no assumption of periodicity, since there is a section specifically describing simplifications for common periodic signals.

You have to start adding fine print when using RMS with non-repetitive signals, and of course it is possible for people to misinterpret what the numbers mean.
Since it is a type of average the period over which the signal is “averaged” (using that term loosely) will affect the result, so you always should specify the time period over which the RMS is calculated.
That is the same thing which is explicit in the EBU standards for loudness metering where EBU Tech-3341 calls out momentary, short-term, and integrated loudness using essentially the same calculation but over different time periods.

That seems like one of the places where RMS is actually useful. RMS is the thermal equivalent voltage, and low frequency drivers are often thermally limited. The RMS rating of a low frequency driver should indicate the long term RMS signal level it can tolerate without overheating the voice coil due to dissipation in the wire resistance. Especially with modern speaker controllers which are fast enough to monitor instantaneous amplitude (to make sure excursion stays within limits), and RMS values at various time periods to keep track of thermal overload, it seems like you should be able to have a much better understanding of how close to the various limits the drivers are at any particular moment.

Was that because sound system vendors did not reference back to the relevant standards for measuring loudspeaker system components, or just because to many customers were not familiar with the relevant standards?

1 Like