That is odd, there are 44.1k files available from YouTube directly:
format code extension resolution note
139 m4a audio only DASH audio 49k , m4a_dash container, mp4a.40.5 (22050Hz), 1.27MiB
251 webm audio only tiny 128k , webm_dash container, opus @128k (48000Hz), 3.35MiB
140 m4a audio only tiny 129k , m4a_dash container, mp4a.40.2@129k (44100Hz), 3.37MiB
278 webm 256x144 DASH video 95k , webm_dash container, vp9, 30fps, video only
160 mp4 256x144 DASH video 108k , mp4_dash container, avc1.4d400b, 30fps, video only
242 webm 426x240 DASH video 220k , webm_dash container, vp9, 30fps, video only
133 mp4 426x240 DASH video 242k , mp4_dash container, avc1.4d400c, 30fps, video only
134 mp4 640x360 360p 65k , mp4_dash container, avc1.4d401e@ 65k, 30fps, video only, 1.70MiB
243 webm 640x360 DASH video 405k , webm_dash container, vp9, 30fps, video only
135 mp4 854x480 480p 92k , mp4_dash container, avc1.4d401f@ 92k, 30fps, video only, 2.41MiB
244 webm 854x480 DASH video 752k , webm_dash container, vp9, 30fps, video only
18 mp4 640x360 360p 179k , avc1.42001E, 30fps, mp4a.40.2 (44100Hz), 4.67MiB (best)
I think Opus only supports 48k, which is why format 251 would be 48k, but the m4a audio in format 140 is available as 44.1k at about the same bitrate.
Both the opus and m4a version have a peak value at 0 dBFS when converted to wav.
(sndfile-info output truncated to just the relevant info for clarity)
$ sndfile-info Cristina_Davena_opus_convert.wav
File : Cristina_Davena_opus_convert.wav
Sample Rate : 48000
Signal Max : 32768 (0.00 dB)
$ sndfile-info Cristina_Davena_m4a_convert.wav
Sample Rate : 44100
Signal Max : 32768 (0.00 dB)
Both versions have true peak values over 1dB above 0 dBFS:
$ ebur128 Cristina_Davena_opus_convert.wav
Integrated loudness: 15.0 LU
Loudness range: 6.3 LU
Peak level 1.5 dB
$ ebur128 Cristina_Davena_m4a_convert.wav
Integrated loudness: 15.0 LU
Loudness range: 6.3 LU
Peak level 1.3 dB
You would need to attenuate by 1.5dB or more to be sure there were no points in the file which could be clipped.
YouTube does not have uncompressed audio available, the best you can do is look for the format with the best quality codec and highest bitrate.
The output above showing all the formats available is youtube-dl with the -F option to display available formats, then you can download using -x to discard video and keep only audio, and -f <format_number> to pick the version you want.
You must be looking at a narrow range of tracks, definitely not a problem with all tracks.
Although I was a bit shocked just now when I looked at the videos from Sony Classical on the Yo-Yo Ma channel, they are at 6.0 LU according to ebur128, and classical recordings should typically be at around -14 LU, so about 20dB hotter than optimal.
OK, looks like lots of videos still have soundtracks that are compressed a lot more than necessary.
I apparently was getting playback and ingest confused. I thought YouTube was normalizing based on LU at ingest, but apparently the player takes care of it.
According to this site, the player will decrease the volume on playback to -14 LUFS:
Mastering for streaming
tl/dr: that track you are using is not a good example to follow regarding recording levels, it is not following current best practice, but apparently no one else is either, they are just getting pounded down with a hammer at playback.