Clock Drift in Multimedia Recordings
Introduction
Clock drift, put simply, is a property of a clock by which it runs too fast or too slow, compared to some reference time, which currently is the Coordinated Universal Time (UTC). As an example, you might have experienced that you need to re-adjust your alarm clock or wrist watch every once in a while, because it falls behind and you’re always late, or it wakes you too early, stealing you important minutes of sleep. Because no clock in the world runs at the ideal speed, all of them have inherent clock drift, even your expensive Rolex. If you really wanted a precise clock, you could try to get an atomic clock like the NIST-F2, which stays precise to the second for at least 300 million years. Much more detailed information on this topic can be found in a great paper called The Science of Timekeeping by David W. Allan et al.
Because digital devices are always driven by clocks (mostly in the form of crystal oscillators), which coordinate the circuitry and tell the chips when to execute an action, they are all prone to drift. This not only affects the time that they e.g. display to the user, but influences many other things as well, for example: a computer CPU does not run at exactly the 3 GHz as specified, a cinema does not show a movie at exactly 24 pictures per second as specified, an iPod does not play back music at its nominal 44.1 kHz sampling rate, and your smartphone/camcorder/etc. does not record audio and video at the exact frame and sampling rates as specified in the manual. The reason why you’ve never heard about this effect is because it’s so small, that no human being will ever be able to notice it on its own, but it can still become a problem when working with different devices in parallel. In the multimedia domain, the problem gets apparent when working with parallel audio and/or video recordings from different devices. Professional studio environments avoid it by using specialized and expensive equipment where multiple devices are synchronized to a single central clock or single devices can record multiple tracks, but every amateur audio- or videographer will sooner or later run into the problem.
Examples
Lets take a very simple example: You want to conduct an interview and make a video from it. Since your video camera does not allow you to record good quality audio, you additionally record the interview with a separate audio recorder, with the intention to replace the audio track of the video camera with the superior track from your audio recorder. On your computer, you synchronize both recordings at the start of the interview, but, unfortunately, you notice that they lose synchrony towards the end. Although you recorded both tracks at the same time, on your computer, one is faster than the other. Technically, assuming that both devices recorded audio at a nominally specified sampling rate of 48000 Hz, the following could have happened: The video camera recorded audio at 48010 samples per second, and the audio recorder at 47980 samples per second, which means that the former recorded too many, the latter too few samples. When playing back both recordings on a computer at the ideal sampling rate of 48000 samples per second, the camera recording plays back slower than real-time, and the audio recorder recording plays back faster. This is because the computer assumes 48000 samples to be exactly one second, and playing back 48010 samples thus takes slightly longer than a second. This applies the other way round to the second recording, and no matter what the computer’s playback sampling rate really is, they will never play back at the same speed, because recordings made from two different time bases can never be synchronous on a single time base.
Solutions
There are basically three ways to solve the problem: (1) money, (2) time, and (3) luck. Money means to buy professional equipment, using it correctly, and not having to care about the problem any more. Time means to just record with multiple devices, and tackle the problem in post production by resampling the recordings to a common time base, or hiding the drift by intelligently cutting between clips and synchronizing each excerpt on its own. Resampling works well with audio tracks, whereas resampling the frame rate of a video is usually not a good idea, because it often goes with a noticeable degradation of the picture quality which leaves cutting as the only option. The third way, luck, means to find multiple devices that go well together, which means that their relative drifts to each other are so small, that they don’t get noticeable in a reasonable amount of time, which depends on how long the recordings need to be.
Conclusion
When working with recordings from different sources that need to be synchronized, clock drift is always something that needs to be taken care of. Except for the usage of professional synchronized devices, this includes all cases where events are recorded with multiple cameras, e.g. speeches or concerts, no matter if by a specialized crew or sourced from the crowd (including the internet, e.g. YouTube). Great examples are a full crowd-sourced concert video of Radiohead, with contributions of more than 60 fans, and a music video of DJ Tiesto, entirely made of videos from fans’ mobile phones.
Disclaimer
Please mind that the explanation in this text is simplified and considers clock drift as a constant, which is not entirely true. There are non-linear short-time factors that influence the drift over time, like battery voltage and temperature, and long-time factors like the aging of crystal oscillators. They usually contribute only a small but measurable part to the overall drift factor, while the large constant part results from production tolerances in crystal manufacturing.