Hello everyone,
I’m trying to understand how rms audio levels are calculated and I’ve been stuck for several days on this calculation in most audio software (Ardour, jkmeters, x42.meters, Adobe audition). When I calculate the rms value of a sine wave with a peak at 0 dBFS (Max amp = 1.0f) myself, I get -3 dB, which is consistent with the definition of rms: rms/ square root of two for a sine wave.

But in Ardour (And in all software I tested), I have 0 dB. Why ?

When I look at the calculation of the rms algorithm of open source software, it looks like this:
Sum of (squares of the amplitude of the samples) multiplied by 2

for (unsigned int i = 0; i < rms_buffer_size; ++i) {
sum+= 2.0*rms_buffer[i]*rms_buffer[i];
}
rms = sqrt(sum/rms_buffer_size);
rms_db = 20*log10(rms);

Hello Saam, thank you for your response. I used a mono signal to do my testing, so I don’t think that’s it.

If I refer to this ITU document, page 9: "All output signal levels specified in this clause are relative to decibels relative to full scale (dBFS), where 0 dBFS represents the root mean square (RMS) level of a full-scale sinusoidal signal"

If I understand correctly a sinusoidal at maximum level (max sample = 1) must have an RMS level of 0 dBFS.

So the RMS calculation that I put in my initial post is consistent with this definition, but I don’t understand what we are monitoring, it is no longer the amplitude…

Keeping in mind, I just took a quick glance at that document:

No it must have a peak level of 0dBFS. In that document (Which really only covers the analog IO of mobile telephones for the most part) that are saying that levels reference will be relative to the RMS valus of a sine wave with peak of 0dBFS. This is not really the same as RMS value of 0dBFS, which would result in peaks of greater than 0dBFS, and significant distortion as a result as 0dBFS is the greatest value you can have without clipping.

By the way, in actuality I believe you cannot calculate the RMS of a complex audio waveform in a simple fashion as you would need to break down the complex waveform to each of it’s components, which are constantly changing for audio, and then calculate the RMS of each components frequency and add them together. What measurements you see that are “RMS” of a complex waveform are either a bit of BS (As they take some assumptions to get the value) or just completely mislabeled as some have taken RMS to mean ‘average’. Worth a read here:

I too was confused about RMS. I have a Western philosopher’s mind. This is an extract from my notes from my research.

Audio Signal Power and Loudness

Electrical power = voltage**2 / electrical resistance

DeciBels are 1/10 of a Bel, which are a logarithmic
relative proportional difference according to the formula
log10( Power A / Power B)
= [by power proportion] log10( Voltage A ** 2 / Voltage B ** 2 )
= [by log rule transform] 2 * log10( Voltage A / Voltage B ).

Because signal voltage is easy to determine, decibles are typically
calculated as 10 * 2 * log10( Voltage A / Voltage B ).

As discovered by Fourier, all periodic waves can be represented by
combinations of sine waves. Sine waves are essentually atomic waves
as far as I can tell. In theory, if all the sine waves are known, they
can be mathematically evaluated individually and summed or something like that. It seems a graphical approach on the random digital signal is used, i.e. the almost calculus is done at the sampling rate of the audio signal.

What is the absolute average voltage of a sine wave?
Circumferance = 2 * pi radians (radian arc segment equals radius).
Radians make integral math easiest.
Sine wave equation: y = sin( r * t ), where r is the circle radius or
maximum amplitude.
Area under one curve: S[integral] from t=0 to t=pi/r of sin( r * t ) dt.
Derivative of sin(x) is cos(x).
Derivative of sin(rt) is cos(rt) * r
Use substitution. Let u = rt and du = r dt.
Get: S[integral] of sin(u) * 1/r * du = 1/r * S[integral] of sin(u) du
= difference of -cos(rt) / r at each limit
= -cos( pi )/r - -cos( 0 )/r
= 1/r + 1/r = 2/r
Average height = area under one curve / (pi/r) = (2r)/(pi*r) = 2/pi
If the sine wave is scaled by a multiplicative constant, i.e. VMax * sin (rt), then the constant simple moves unchanged out of the integral to get an area under one curve of VMax * 2 / r and an average amplitude/voltage of VMax * 2 / pi.

However, audio engineers do NOT care about average voltage or amplitude. They care about the voltage of average power because power correlates to loudness, which humans perceive on an approximately logarithmic scale. Humans also perceive brightness in an approximately logarithmic scale; hence, gamma encoding and decoding.

Average power is proportional to the voltage squared over the time of
a sine wave, and the voltage of average power is the square root
of the area under the curve of the voltage squared over the time of
one curve of its associated sine wave.

Area under one curve: S[integral] from t=0 to t=pi/r of (sin( r * t ))**2 dt.
Derivative of x ** n = n * x ** (n-1).
Antiderivative of x ** n is 1/(n+1) * x ** (n+1).
Use substitution. Let u = rt and du = r dt.
Get: S[integral] of (sin(u))2 * 1/r * du.
Use substitution. Let v = sin(u) and dv = cos(u) du = sin(v - pi/2) du.
Get: S[integral] of v2 * 1/r * 1/sin(v - pi/2) * dv. NOT SURE HOW TO PROCEED

However, it is asserted that the voltage of average power = VMax / (2 ** 1/2). The voltage of average power is called the root-mean-square voltage, which is the graphical method of almost calculus calculation in reverse order. Each voltage sample is squared, a sequence of squared voltages are added and divided by their number to get a mean/average, and that mean is lowered to the 1/2 power or input into the square root function.

Loudness Units Full Scale (LUFS) are defined by a sample processing algorithm derived from DBFS that is typed according to duration of the sample period: Momentary LUFS, Short-Term LUFS, and Integrated LUFS.

My initial question was: where does this factor of 2 come from in the calculation of the RMS value on the dB FS scale. I will therefore stick to endolith’s response in this post cited above.

This definition of dBFS is explicitly designed such that the dBFS value of a full-scale sine wave equals 0 (and in consequence, that of a full-scale square wave is +3 dBFS). Since the RMS of the full-scale sine wave is 1/sqrt(2), multiplying rms(signal) by sqrt(2) ensures that the formula evaluates to 0 for the full scale sine wave: * 20log10(rms(signal) * sqrt(2)) = 20log10((1/sqrt(2)) * sqrt(2)) = 20log10(1) = 0*

It can be found in point 3.4 of this AES document.

My initial question was: where does this factor of 2 come from in the calculation of the RMS value on the dB FS scale.

I am no expert, but I don’t think the relative decibel scale position matters for the use of the factor of 2 with ‘power’ samples. The critical point is that LOUDNESS is proportional to ELECTRICAL POWER is proportional to VOLTAGE * VOLTAGE, which is the AMPLITUDE * AMPLITUDE.

Bels and Decibels are on a logarithmic scale and relate to relative POWER as POWER A divided by POWER B but input into a log base 10 function to make the relative POWER logarithmic relative POWER.

POWER is not easy to measure, but AMPLITUDE that is VOLTAGE is easy to measure. The relative VOLTAGES squared, divided against each other, and converted to a logarithmic scale is the same as POWER and LOUDNESS in Bels or Decibels because the ELECTRICAL RESISTANCE (in Ohms) cancels out when the division of one POWER squared by another POWER squared.

The extra multiplication of the squared AMPLITUDE or VOLTAGE is to make the VOLTAGE squared before the division before the log function. But with the sampling of VOLTAGES over time, the VOLTAGE sample is both squared and multiplied by 2. That’s weird in my non-expert opinion.

The sampling of (relative?) POWER over time (like area under the curve in Calculus) requires the sampling of VOLTAGE ** 2 (with constant factor per RESISTANCE not specified, but decibels are relative to some reference POWER value of the same resistance and that should cancel out with any decibel value).

The RMS VOLTAGE is the VOLTAGE of average POWER. the RMS VOLTAGE **2 / RESISTANCE = AVERAGE POWER.

If you draw a sine wave and also draw a horizontal line of VOLTAGE = AVERAGE VOLTAGE, the area under the curve from any 0-VOLTAGE intersect to any 0-VOLTAGE intersect (that horizontal line of zero volts in the middle of the sine wave and the area under the line of AVERAGE VOLTAGE are the same.

If you draw a modified wave that is the square of the sine wave, you have a power wave. I think that if you draw a horizontal line at RMS VOLTAGE * RMS VOLTAGE, the areas would be equal because the area under the curve represents POWER and LOUDNESS.

The area under a curve is part of the Fundamental Theorem of Calculus. The idea of approximating the area under (or over) a smooth curve with vertical slivers of the same width is not difficult. Make the number of slivers infinite, and the approximation goes to the exact value (if, I suppose, everything is ‘well defined’).

After that, you need to understand what curve. Use the POWER curve not the VOLTAGE curve.

I don’t see a factor of 2 in the integral expression. The factor of 2 is for the Bel calculation between to unsquared VOLTAGES because it is logarithmic.

I am not convinced the times 2 is needed, but what do I know?

for (unsigned int i = 0; i < rms_buffer_size; ++i) {
sum+= 2.0*rms_buffer[i]*rms_buffer[i];
}
rms = sqrt(sum/rms_buffer_size);
rms_db = 20*log10(rms);

Why not just multiply the final sum by two. The computation is repetitive. How do you calculate RMS decibels without a reference VOLTAGE in the denominator? Apparently, it’s the value 1 volt or at least 1 (suggests full scale to me going by what Robin said next). If all several open source calculations include the factor of 2, then I don’t know where the misunderstanding is, but I know the factor of 2 does not fit my understanding of the theory of integral calculus for relative electrical POWER, and the webpage “Calculating the power of a signal”, linked above, is consistent with my understanding. I think it’s spurious, but I could be wrong.

Digital audio signal level is represented as floating point value in the range [-1 … +1] where an absolute value of 1.0 is the max possible amplitude (values above that will be clipped).

This signal level corresponds to Voltage.

Yep, and since the rms_sum value is always positive it can be even taken out of the sqrt() or the log().

Assuming constant impedance:

Power ratio = 10 log (P1/P2) = 10 log (V1/V2)^2 = 20 * log (V1/V2)

@Robin, just to be clear for everyone, I think you mean V1^2/V2^2 and not the square of the log. This makes no sense to me:

Yep, and since the rms_sum value is always positive it can be even taken out of the sqrt() or the log()

RMS is the procedure in reverse order: (1) sum, (2) divide by the count, (3) take the square root. Sample values are summed first and can’t be taken out. The final RMS voltage can me manipulated, I suppose. I am just expressing my understanding not giving a definitive opinion. I don’t have one and don’t wish to do the research and study to get one. Maybe my comments are useful, but they come with no warranty.