Imported track is clipping

Hi everybody,

I just found something that looked weird to me but I think it could be because I’m pretty new to audio production and I still have to get certain things right…

I’m mixing a track I’ve recorded with a friend and I decided to download some tracks from youtube to use as references while mixing. So, after the download, I just imported the tracks in Ardoud using the import function.
After that, I just noticed that the tracks I imported were clipping. In fact, I can either see the red colored peaks on the region and, of course, I’m able to hear a terrible clipping happening at the audio level.

So my question is: how it comes these tracks are clipping? Suppose I want to use these tracks as a reference for mastering (for testing loudness or compression of my track comparing to the reference as an example…) I really think I cannot base my comparisons listening to a clipping reference because what I hear seems definitely wrong and I don’t want my track to sound like that at all…

Clearly, if I turn the fader of the imported tracks down, clipping disappears, but I don’t know if this would be the correct way to proceed, since I wouldn’t know HOW MUCH to turn the fader down to have a good reference. Suppose I want to compare my track loudness with the reference loudness: how much have I to lower my fader? Every time I turn my fader down loudness decreases, but what is the “right” loudness I want to get?..

Could you help me to figure out this “weird” (to me) situation?

Thanks in advance

By default Ardour indicates clipping when the level is >= -0.1 dBFS (this can be configured in preferences).

If you import a file that is normalized to 0 dBFS, parts may show up as clipped.
However when playing it back it should be fine (except for inter-sample peaks).

Thanks for the reply!

Then this is weird… Because I tried to listen to the region in the track (leaving the fader to the default position, which is at the level of the little white horizontal bar near the top of the fader) and it was all but fine!

You should show a picture of your mixer settings. The level at the output is going to be a combination of the signal level in the file, any trim amplifier settings in the channel, the channel fader, the master bus fader, and the monitor section fader. That is a lot of places the signal level could be changed.
I forget, does Ardour display the peak values after importing? If not (or if you did not write them down), do you have ebumeter installed? That is a command line tool that will display signal levels. Or sndfile-info will show the peak sample value in a file. That would indicate the value in the original file you extracted from YouTube.

1 Like

Hi and thank you for answering!

This is a screenshot of my mixer window of the ardour session I’m using for mastering my track using the “Ref” track as a reference.

Looking at the editor view:

The clipping peaks (which appear onlyon the “Ref” track) can also be seen graphically.

I have also to add that, unlike my first test, I imported the reference track in a brand new Ardour session and, listening to the imported “Ref” track, I noticed that the audio seemed better.
I thought that maybe the red peaks I see on the region are only some peaks due to the 0dBFS normalization of the file (as supposed by @x42) which - however - don’t make the track sound horrible… But, as I can see, these seem a lot of red peaks… Is it a normal thing?

The only difference with my first test which I think might be relevant is that in my first test I had my monitor knob set to +6dB, thus actually having my audio level boosted up… Could have been this to make the audio of the track looking so ugly at listening? Is it still correct to import a mastered track and have all these red peaks though?

That is definitely clipped. Did you have to convert sample rate when importing? If the original audio was sampled at 48kHz and you converted to 44.1kHz when importing, the process of filtering can shift the inter-sample peaks so that a file which previously had peak sample values of full scale, but which had “true peak” over full scale will have the peak values on the samples. The concept is much easier to describe graphically, but basically yes, what you are seeing can be pretty common with pop and rock music, but it is not good practice, and I thought that YouTube was normalizing the audio now so that was not likely to occur.

How did you download and extract the audio, youtube-dl or something else? If you post a link I can check, but likely that is just how the audio was processed, compressed really heavily and boosted as much as possible. I would not recommend using that as your reference, it is considered bad practice and probably won’t sound good on services which normalize based on loudness value and not peak value, which is basically all of them. I know Spotify, Pandora, and Apple Music do, and I’m pretty sure even YouTube does for new submissions, but maybe not for existing material.

Hi again!

YES, I had to convert sample rate when importing exactly the way you described it. The file was sampled at 48kHz and Ardour told me he had to convert it to 44.1KHz…

The service I used to obtain the track is this one

this, instead, is the YT video of the track I wanted to use as reference.

I was searching for a service which could allow me to download a .wav instead of a .mp3, just to maximize sound quality, but I don’t know if there are better services to obtain YT tracks…

Strange thing is that I tried to download (using the exact same service) and import different tracks, and I always found the same problem… How can it be?

Likely resampling is responsible to this. This can turn inter-sample peaks of the 48kHz file into actual digital peaks.

Can you try if the same happens if you import the file into an Ardour session that’s using 44.1kHz sample-rate?

Just checked with sndfile-info after converting to .wav:

Sample Rate : 44100
Frames      : 9634816
Channels    : 2
Format      : 0x00010002
Sections    : 1
Seekable    : TRUE
Duration    : 00:03:38.477
Signal Max  : 32768 (0.00 dB)

The file is normalized to digital peak. And doing a loudness analysis shows that it has a true-peak of 1.3 dBTP.

So yes, importing to ardour using float, or resampling the file may introduce clipping.

That is odd, there are 44.1k files available from YouTube directly:

format code  extension  resolution note
139          m4a        audio only DASH audio   49k , m4a_dash container, mp4a.40.5 (22050Hz), 1.27MiB
251          webm       audio only tiny  128k , webm_dash container, opus @128k (48000Hz), 3.35MiB
140          m4a        audio only tiny  129k , m4a_dash container, mp4a.40.2@129k (44100Hz), 3.37MiB
278          webm       256x144    DASH video   95k , webm_dash container, vp9, 30fps, video only
160          mp4        256x144    DASH video  108k , mp4_dash container, avc1.4d400b, 30fps, video only
242          webm       426x240    DASH video  220k , webm_dash container, vp9, 30fps, video only
133          mp4        426x240    DASH video  242k , mp4_dash container, avc1.4d400c, 30fps, video only
134          mp4        640x360    360p   65k , mp4_dash container, avc1.4d401e@  65k, 30fps, video only, 1.70MiB
243          webm       640x360    DASH video  405k , webm_dash container, vp9, 30fps, video only
135          mp4        854x480    480p   92k , mp4_dash container, avc1.4d401f@  92k, 30fps, video only, 2.41MiB
244          webm       854x480    DASH video  752k , webm_dash container, vp9, 30fps, video only
18           mp4        640x360    360p  179k , avc1.42001E, 30fps, mp4a.40.2 (44100Hz), 4.67MiB (best)

I think Opus only supports 48k, which is why format 251 would be 48k, but the m4a audio in format 140 is available as 44.1k at about the same bitrate.

Both the opus and m4a version have a peak value at 0 dBFS when converted to wav.
(sndfile-info output truncated to just the relevant info for clarity)

$ sndfile-info Cristina_Davena_opus_convert.wav

File : Cristina_Davena_opus_convert.wav
Sample Rate : 48000
Signal Max : 32768 (0.00 dB)

$ sndfile-info Cristina_Davena_m4a_convert.wav
Sample Rate : 44100
Signal Max : 32768 (0.00 dB)

Both versions have true peak values over 1dB above 0 dBFS:
$ ebur128 Cristina_Davena_opus_convert.wav
Integrated loudness: 15.0 LU
Loudness range: 6.3 LU
Peak level 1.5 dB

$ ebur128 Cristina_Davena_m4a_convert.wav
Integrated loudness: 15.0 LU
Loudness range: 6.3 LU
Peak level 1.3 dB

You would need to attenuate by 1.5dB or more to be sure there were no points in the file which could be clipped.

YouTube does not have uncompressed audio available, the best you can do is look for the format with the best quality codec and highest bitrate.
The output above showing all the formats available is youtube-dl with the -F option to display available formats, then you can download using -x to discard video and keep only audio, and -f <format_number> to pick the version you want.

You must be looking at a narrow range of tracks, definitely not a problem with all tracks.
Although I was a bit shocked just now when I looked at the videos from Sony Classical on the Yo-Yo Ma channel, they are at 6.0 LU according to ebur128, and classical recordings should typically be at around -14 LU, so about 20dB hotter than optimal.

OK, looks like lots of videos still have soundtracks that are compressed a lot more than necessary.
I apparently was getting playback and ingest confused. I thought YouTube was normalizing based on LU at ingest, but apparently the player takes care of it.

According to this site, the player will decrease the volume on playback to -14 LUFS:
Mastering for streaming

tl/dr: that track you are using is not a good example to follow regarding recording levels, it is not following current best practice, but apparently no one else is either, they are just getting pounded down with a hammer at playback.