How to measure delay between streams using Audio Fingerprinting

Question

I need to measure the delay difference between different streams of the same TV channel on different platforms. Details can be given for this problem as follows:

As known there are several reasons why different platforms show the live TV channels not exactly at the same time but within the several seconds of each other. The delay is different from one platform to another.

For this i am thinking first recording a stream then using audio fingerprinting in python with the help of dejavu platform(the coding language can be changed). But problem is how can i achieve this ? How can i find the delay between two streams using audio fingerprinting ? Also forexample i want to compare the delay of the same TV channel between web, mobile platform and from Television. How can i record them from different platforms and make operations on them.

I will be happy to hear suggestion from you guys.

Fraser · Answer 1 · 2018-03-28T11:06:30.590

0

It sounds like you'll want to cross-correlate the two signals. The highest peak in the output would correspond to the time delay. Furthermore, the inputs need not be identical which is handy in this case.

edited Mar 28 '18 at 11:06

answered Mar 28 '18 at 10:56

Fraser

91
8

score 0 · Answer 2 · answered Oct 10 '22 at 15:58

Source and for further explanation: https://dev.to/hiisi13/find-an-audio-within-another-audio-in-10-lines-of-python-1866

First you need to decode them into PCM and ensure it has specific sample rate, which you can choose beforehand (e.g. 16KHz). You'll need to resample songs that have different sample rate. High sample rate is not required since you need a fuzzy comparison anyway, but too low sample rate will lose too much details.

You can use the following code for that:

ffmpeg -i audio1.mkv -c:a pcm_s24le output1.wav
ffmpeg -i audio2.mkv -c:a pcm_s24le output2.wav

Then you can use the following code, it normalizes PCM data (i.e. find maximum sample value and rescale all samples so that sample with largest amplitude uses entire dynamic range of data format and then converts it to spectrum domain (FFT) and finds a peak using cross correlation to finally return the offset in seconds

import argparse

import librosa
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt


def find_offset(within_file, find_file, window):
    y_within, sr_within = librosa.load(within_file, sr=None)
    y_find, _ = librosa.load(find_file, sr=sr_within)

    c = signal.correlate(y_within, y_find[:sr_within*window], mode='valid', method='fft')
    peak = np.argmax(c)
    offset = round(peak / sr_within, 2)

    fig, ax = plt.subplots()
    ax.plot(c)
    fig.savefig("cross-correlation.png")

    return offset


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--find-offset-of', metavar='audio file', type=str, help='Find the offset of file')
    parser.add_argument('--within', metavar='audio file', type=str, help='Within file')
    parser.add_argument('--window', metavar='seconds', type=int, default=10, help='Only use first n seconds of a target audio')
    args = parser.parse_args()
    offset = find_offset(args.within, args.find_offset_of, args.window)
    print(f"Offset: {offset}s" )


if __name__ == '__main__':
    main()

How to measure delay between streams using Audio Fingerprinting

2 Answers2