Autocorrelation code in Python produces errors (guitar pitch detection)

Question

This link provides code for an autocorrelation-based pitch detection algorithm. I am using it to detect pitches in simple guitar melodies.

In general, it produces very good results. For example, for the melody C4, C#4, D4, D#4, E4 it outputs:

262.743653536
272.144441273
290.826273006
310.431336809
327.094621169

Which correlates to the correct notes.

However, in some cases like this audio file (E4, F4, F#4, G4, G#4, A4, A#4, B4) it produces errors:

325.861452246
13381.6439242
367.518651703
391.479384923
414.604661221
218.345286173
466.503751322
244.994090035

More specifically, there are three errors here: 13381Hz is wrongly detected instead of F4 (~350Hz) (weird error), and also 218Hz instead of A4 (440Hz) and 244Hz instead of B4 (~493Hz), which are octave errors.

I assume the two errors are caused by something different? Here is the code:

slices = segment_signal(y, sr)
for segment in slices:
  pitch = freq_from_autocorr(segment, sr)
  print pitch

def segment_signal(y, sr, onset_frames=None, offset=0.1):
  if (onset_frames == None):
    onset_frames = remove_dense_onsets(librosa.onset.onset_detect(y=y, sr=sr))

  offset_samples = int(librosa.time_to_samples(offset, sr))

  print onset_frames

  slices = np.array([y[i : i + offset_samples] for i
    in librosa.frames_to_samples(onset_frames)])

  return slices

You can see the freq_from_autocorr function in the first link above.

The only think that I have changed is this line:

corr = corr[len(corr)/2:]

Which I have replaced with:

corr = corr[int(len(corr)/2):]

UPDATE:

I noticed the smallest the offset I use (the smallest the signal segment I use to detect each pitch), the more high-frequency (10000+ Hz) errors I get.

Specifically, I noticed that the part that goes differently in those cases (10000+ Hz) is the calculation of the i_peak value. When in cases with no error it is in the range of 50-150, in the case of the error it is 3-5.

A bit of theory: The smallest the signal segment you use, the less accuracy you'll get when detecting high frequencies. Indeed, when you work on a signal segment, you basically multiply the full signal by a time window. In the frequency domain, this corresponds to a convolution of the signal's spectrum by a sinc function ((sin x)/x). The spectrum's elements are "averaged" with their neighbours. The smaller your time window, the broader the sinc convolution will be, and the more "averaging" you'll get. Averaging acts like a low-pass filter: high frequencies tend to fade... — ma3oun, Jun 01 '17 at 13:39

score 3 · Accepted Answer · answered May 31 '17 at 06:03

The autocorrelation function in the code snippet that you linked is not particularly robust. In order to get the correct result, it needs to locate the first peak on the left hand side of the autocorrelation curve. The method that the other developer used (calling the numpy.argmax() function) does not always find the correct value.

I've implemented a slightly more robust version, using the peakutils package. I don't promise that it's perfectly robust either, but in any case it achieves a better result than the version of the freq_from_autocorr() function that you were previously using.

My example solution is listed below:

import librosa
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import fftconvolve
from pprint import pprint
import peakutils

def freq_from_autocorr(signal, fs):
    # Calculate autocorrelation (same thing as convolution, but with one input
    # reversed in time), and throw away the negative lags
    signal -= np.mean(signal)  # Remove DC offset
    corr = fftconvolve(signal, signal[::-1], mode='full')
    corr = corr[len(corr)//2:]

    # Find the first peak on the left
    i_peak = peakutils.indexes(corr, thres=0.8, min_dist=5)[0]
    i_interp = parabolic(corr, i_peak)[0]

    return fs / i_interp, corr, i_interp

def parabolic(f, x):
    """
    Quadratic interpolation for estimating the true position of an
    inter-sample maximum when nearby samples are known.

    f is a vector and x is an index for that vector.

    Returns (vx, vy), the coordinates of the vertex of a parabola that goes
    through point x and its two neighbors.

    Example:
    Defining a vector f with a local maximum at index 3 (= 6), find local
    maximum if points 2, 3, and 4 actually defined a parabola.

    In [3]: f = [2, 3, 1, 6, 4, 2, 3, 1]

    In [4]: parabolic(f, argmax(f))
    Out[4]: (3.2142857142857144, 6.1607142857142856)
    """
    xv = 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
    yv = f[x] - 1/4. * (f[x-1] - f[x+1]) * (xv - x)
    return (xv, yv)

# Time window after initial onset (in units of seconds)
window = 0.1

# Open the file and obtain the sampling rate
y, sr = librosa.core.load("./Vocaroo_s1A26VqpKgT0.mp3")
idx = np.arange(len(y))

# Set the window size in terms of number of samples
winsamp = int(window * sr)

# Calcualte the onset frames in the usual way
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onstm = librosa.frames_to_time(onset_frames, sr=sr)

fqlist = [] # List of estimated frequencies, one per note
crlist = [] # List of autocorrelation arrays, one array per note
iplist = [] # List of peak interpolated peak indices, one per note
for tm in onstm:
    startidx = int(tm * sr)
    freq, corr, ip = freq_from_autocorr(y[startidx:startidx+winsamp], sr)
    fqlist.append(freq)
    crlist.append(corr)
    iplist.append(ip)    

pprint(fqlist)

# Choose which notes to plot (it's set to show all 8 notes in this case)
plidx = [0, 1, 2, 3, 4, 5, 6, 7]

# Plot amplitude curves of all notes in the plidx list 
fgwin = plt.figure(figsize=[8, 10])
fgwin.subplots_adjust(bottom=0.0, top=0.98, hspace=0.3)
axwin = []
ii = 1
for tm in onstm[plidx]:
    axwin.append(fgwin.add_subplot(len(plidx)+1, 1, ii))
    startidx = int(tm * sr)
    axwin[-1].plot(np.arange(startidx, startidx+winsamp), y[startidx:startidx+winsamp])
    ii += 1
axwin[-1].set_xlabel('Sample ID Number', fontsize=18)
fgwin.show()

# Plot autocorrelation function of all notes in the plidx list
fgcorr = plt.figure(figsize=[8,10])
fgcorr.subplots_adjust(bottom=0.0, top=0.98, hspace=0.3)
axcorr = []
ii = 1
for cr, ip in zip([crlist[ii] for ii in plidx], [iplist[ij] for ij in plidx]):
    if ii == 1:
        shax = None
    else:
        shax = axcorr[0]
    axcorr.append(fgcorr.add_subplot(len(plidx)+1, 1, ii, sharex=shax))
    axcorr[-1].plot(np.arange(500), cr[0:500])
    # Plot the location of the leftmost peak
    axcorr[-1].axvline(ip, color='r')
    ii += 1
axcorr[-1].set_xlabel('Time Lag Index (Zoomed)', fontsize=18)
fgcorr.show()

The printed output looks like:

In [1]: %run autocorr.py
[325.81996740236065,
 346.43374761017725,
 367.12435233192753,
 390.17291696559079,
 412.9358117076161,
 436.04054933498134,
 465.38986619237039,
 490.34120132405866]

The first figure produced by my code sample depicts the amplitude curves for the next 0.1 seconds following each detected onset time:

The second figure produced by the code shows the autocorrelation curves, as computed inside of the freq_from_autocorr() function. The vertical red lines depict the location of the first peak on the left for each curve, as estimated by the peakutils package. The method used by the other developer was getting incorrect results for some of these red lines; that's why his version of that function was occasionally returning the wrong frequencies.

My suggestion would be to test the revised version of the freq_from_autocorr() function on other recordings, see if you can find more challenging examples where even the improved version still gives incorrect results, and then get creative and try to develop an even more robust peak finding algorithm that never, ever mis-fires.

That is indeed an improvement. However, I sometimes get this error: `i_peak = peakutils.indexes(corr, thres=0.8, min_dist=5)[0] IndexError: index 0 is out of bounds for axis 0 with size 0` as `indexes` does not find anything with that threshold. — pavlos163, Jun 01 '17 at 18:35
Fixed it with: https://pastebin.com/2yVVWsAJ. Thanks a lot. Any other tips? — pavlos163, Jun 02 '17 at 13:58

score 0 · Answer 2 · answered May 30 '17 at 22:04

The autocorrelation method is not always right. You may want to implement a more sophisticated method like YIN:

http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

or MPM:

http://www.cs.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf

Both of the above papers are good reads.

Autocorrelation code in Python produces errors (guitar pitch detection)

2 Answers2