2

I am trying to detect the pitch of a B3 note played with a guitar. The audio can be found here.

This is the spectrogram: spectrogram

As you can see, it is visible that the fundamental pitch is about 250Hz which corresponds to the B3 note.

It also contains a good amount of harmonics and that is why I chose to use HPS from here. I am using this code for detecting the pitch:

def freq_from_hps(signal, fs):
    """Estimate frequency using harmonic product spectrum
    Low frequency noise piles up and overwhelms the desired peaks
    """
    N = len(signal)
    signal -= mean(signal)  # Remove DC offset

    # Compute Fourier transform of windowed signal
    windowed = signal * kaiser(N, 100)

    # Get spectrum
    X = log(abs(rfft(windowed)))

    # Downsample sum logs of spectra instead of multiplying
    hps = copy(X)
    for h in arange(2, 9): # TODO: choose a smarter upper limit
        dec = decimate(X, h)
        hps[:len(dec)] += dec

    # Find the peak and interpolate to get a more accurate peak
    i_peak = argmax(hps[:len(dec)])
    i_interp = parabolic(hps, i_peak)[0]

    # Convert to equivalent frequency
    return fs * i_interp / N  # Hz

My sampling rate is 40000. However, instead of getting a result close to 250Hz (B3 note), I am getting 0.66Hz. How is this possible?

I also tried with an autocorrelation method from the same repo but I also get bad results like 10000Hz.

Thanks to an answer I understand I have to apply a filter to remove the low frequencies in the signal. How do I do that? Are there multiple methods to do that, and which one is recommended?

STATUS UPDATE:

The high-pass filter proposed by the answer is working. If I apply the function in the answer to my audio signal, it correctly displays about 245Hz. However, I would like to filter the whole signal, not only a part of it. A note could lie in the middle of the signal or a signal contain more than one note (I know a solution is onset detection, but I am curious to know why this isn't working). That is why I edited the code to return filtered_audio.

The problem is that if I do that, even though the noise has been correctly removed (see screenshot). I get 0.05 as a result.

spectrogram

pavlos163
  • 2,730
  • 4
  • 38
  • 82
  • have you done a loopback with your soundcard/audio interface or have another known frequency source? - others have reported difficulty getting desired sample rates depending on OS, drivers – f5r5e5d May 14 '17 at 20:39
  • You get nonsense when you try to calculate the pitch of the entire clip because the pitch algorithm doesn't see the spectrogram. The spectrogram shows the power spectrum over time, but the algorithm works with only the average power spectrum of the entire signal. To see what I mean, look at the spectrum of the whole signal and compare it to the spectrum of just the note (0-1 seconds). To address the possibility of multiple notes in a clip, I think you should make a new question. – Michael May 18 '17 at 14:37
  • Thanks, you are right. I was thinking I should segment the signal into multiple signals (one segment of 10-15 frames for every note played) and then do the HPS for every one of them. Do you think that's a logical approach? – pavlos163 May 18 '17 at 16:05

1 Answers1

1

Based on the distances between the harmonics in the spectrogram, I would estimate the pitch to be about 150-200 Hz. So, why doesn't the pitch detection algorithm detect the pitch that we can see by eye in the spectrogram? I have a few guesses:

The note only lasts for a few seconds. At the beginning, there is a beautiful harmonic stack with 10 or more harmonics! These quickly fade away and are not even visible after 5 seconds. If you are trying to estimate the pitch of the entire signal, your estimate might be contaminated by the "pitch" of the sound from 5-12 seconds. Try computing the pitch only for the first 1-2 seconds.

There is too much low frequency noise. In the spectrogram, you can see a lot of power between 0 and 64 Hz. This is not part of the harmonics, so you could try removing it with a high-pass filter.

Here is some code that does the job:

import numpy as np
from scipy.io import wavfile
from scipy import signal
import matplotlib.pyplot as plt

from frequency_estimator import freq_from_hps
# downloaded from https://github.com/endolith/waveform-analyzer/

filename = 'Vocaroo_s1KZzNZLtg3c.wav'
# downloaded from http://vocaroo.com/i/s1KZzNZLtg3c

# Parameters
time_start = 0  # seconds
time_end = 1  # seconds
filter_stop_freq = 70  # Hz
filter_pass_freq = 100  # Hz
filter_order = 1001

# Load data
fs, audio = wavfile.read(filename)
audio = audio.astype(float)

# High-pass filter
nyquist_rate = fs / 2.
desired = (0, 0, 1, 1)
bands = (0, filter_stop_freq, filter_pass_freq, nyquist_rate)
filter_coefs = signal.firls(filter_order, bands, desired, nyq=nyquist_rate)

# Examine our high pass filter
w, h = signal.freqz(filter_coefs)
f = w / 2 / np.pi * fs  # convert radians/sample to cycles/second
plt.plot(f, 20 * np.log10(abs(h)), 'b')
plt.ylabel('Amplitude [dB]', color='b')
plt.xlabel('Frequency [Hz]')
plt.xlim((0, 300))

# Apply high-pass filter
filtered_audio = signal.filtfilt(filter_coefs, [1], audio)

# Only analyze the audio between time_start and time_end
time_seconds = np.arange(filtered_audio.size, dtype=float) / fs
audio_to_analyze = filtered_audio[(time_seconds >= time_start) &
                                  (time_seconds <= time_end)]

fundamental_frequency = freq_from_hps(audio_to_analyze, fs)
print 'Fundamental frequency is {} Hz'.format(fundamental_frequency)
Michael
  • 969
  • 6
  • 19
  • Please see updated question. I added the audio file, fixed the spectrogram and restructured the question. – pavlos163 May 16 '17 at 19:28
  • That looks great. Do you mind explaining the steps you followed? Is everything under `filter_stop_freq` discarded? I don't really understand the use of most of these variables. – pavlos163 May 16 '17 at 23:19
  • I understand it is not easy to explain it here, so a reference would be amazing! :) – pavlos163 May 16 '17 at 23:21
  • Actually, this code does the job, but only if I keep `time_start` and `time_end`. However, doesn't it make sense to filter the whole signal? If I filter the whole signal, I still get values like 0.05. – pavlos163 May 17 '17 at 22:16
  • If I want to also filter high-frequency notes, is that valid code? https://pastebin.com/ytgMbP6x – pavlos163 May 28 '17 at 18:28