Audio drift import problem

Hi friends,

I’m mixing a live recording from two sources, one mono (mic -> Amadeus software) and one stereo (Zoom H4N). Both are over an hour long and 48kHz but for some reason there’s a considerable drift after aligning the tracks by an early transient.

My questions are:

  • is there a way to tell if one of the reported sample rates is incorrect? I’m using Ubuntu and mediainfo shows 48kHz for each file
  • What else could cause the problem? I import both files using the “Import…” dialogue in Ardour
  • is there a non-destructive way to fix this drift?

Thanks!

Output from mediainfo:

General
Complete name                            : 2017.05.22 I, Site_room.WAV
Format                                   : Wave
File size                                : 1 015 MiB
Duration                                 : 1h 1mn
Overall bit rate mode                    : Constant
Overall bit rate                         : 2 304 Kbps

Audio
Format                                   : PCM
Format settings, Endianness              : Little
Format settings, Sign                    : Signed
Codec ID                                 : 1
Duration                                 : 1h 1mn
Bit rate mode                            : Constant
Bit rate                                 : 2 304 Kbps
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 KHz
Bit depth                                : 24 bits
Stream size                              : 1 015 MiB (100%)
General
Complete name                            : condenser mic/2017.05.22. dave's concert.wav
Format                                   : Wave
File size                                : 567 MiB
Duration                                 : 1h 8mn
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 152 Kbps

Audio
Format                                   : PCM
Format settings, Endianness              : Little
Format settings, Sign                    : Signed
Codec ID                                 : 1
Duration                                 : 1h 8mn
Bit rate mode                            : Constant
Bit rate                                 : 1 152 Kbps
Channel(s)                               : 1 channel
Sampling rate                            : 48.0 KHz
Bit depth                                : 24 bits
Stream size                              : 567 MiB (100%)

Assuming you are in fact recording at 48k on both devices: You are running into a classic clock drift problem. This is why a master clock is used in the studio. Even though a device might record at 48kHz, clocks aren’t all identical and often not completely precise so one might be recording at 47593Hz and one at 48006Hz, both treating it as 48k.

You need to clock sync the recording devices to avoid this problem.

This is also why Jack doesn’t tend to play well with multiple sound cards (They need to be clock sync’d).

As seablade says this is why professional equipment has the possibility to sync to an outside sync signal. The consumer equipment does not have this possibility and there is no easy way to correct this. You might get good enough results by separating songs to regions and syncing manually (dragging regions on the timeline). You can also move regions a certain amount of samples at a time.

A device might be clocking in a relatively constant speed, so if you can calculate the amount of drift in for example an hour, you might be able to calculate how many samples you need to adjust backwards or forwards. The adjustment does not need to be exact, it is enough to get it inside a region where you can not hear the difference anymore and it also does not cause any obvious colorations to the sound when the two sound sources are combined together.

The problem is caused by the fact that it is very difficult and expensive to build an accurate clock source. Crystal oscillators are commonly used for the clock and temperature affects the speed of the crystal and the speed also changes when the crystal get older. Even professional equipment won’t hold the accurate clock for long after syncing to an external clock source. One might need to resync the devices a couple of times during a days televison shooting session.

You can avoid this problem by using only one recorder that has enough audio inputs for your needs.

I think rubberband may be able to do what you need (CLI utility from librubberband project). Assuming that the clocks in each device were slightly offset from nominal frequency but did not drift during the hour, you may be able to pick the best quality file as the reference length, and shrink or stretch the other file to match the length. Since the files only differ by 7 minutes over the course of an hour, the 10% change should not make a huge difference. Although now that I think about it there has to be some difference in the start time of the devices, any reasonable quartz oscillator should be within 0.01% of nominal, 10% off would be beyond broken, so likely you will have some work to determine how much in length the relevant parts of the files actually differ.

Assuming you can figure that out, the -D option to rubberband will let you specify a duration, or the -t option will let you give a fractional change (e.g. make the file 1.02 times original, or 0.995 times original, etc.)

$ rubberband --help

Rubber Band
An audio time-stretching and pitch-shifting library and utility program.
Copyright 2007-2012 Particular Programs Ltd.

Usage: rubberband [options] <infile.wav> <outfile.wav>

You must specify at least one of the following time and pitch ratio options.

-t, --time Stretch to X times original duration, or
-T, --tempo Change tempo by multiple X (same as --time 1/X), or
-T, --tempo : Change tempo from X to Y (same as --time X/Y), or
-D, --duration Stretch or squash to make output file X seconds long

-p, --pitch Raise pitch by X semitones, or
-f, --frequency Change frequency by multiple X

-M, --timemap Use file F as the source for key frame map

I think you should also be able to do that graphically, make regions in Ardour that match musical sections, then make the region of one track stretch or shrink to match the reference track.