1

I'm creating a voice training app and I've used FFT to transform the signal from time domain to frequency domain. Prior to applying FFT I've windowed the signal using blackman-harris window. Then I used harmonic product spectrum to extract the fundamental frequency. The lowest frequency is F2 (87.307 Hz) and the highest is C6 (1046.502 Hz). FFT Length is 8192 and the sampling frequency is 44100 Hz.

To fix the octave errors, I applied the rule mentioned here by;

     float[] array = hps.HPS(Data);
     float hpsmax_mag = float.MinValue;
     float hpsmax_index = -1;

     for (int i = 0; i < array.Length; i++)
          if (array[i] > hpsmax_mag)
              {
                 hpsmax_mag = array[i];
                 hpsmax_index = i;
              }

   // Fixing octave too high errors    
      int correctMaxBin = 1;
      int maxsearch = (int) hpsmax_index * 3 / 4;
      for (int j = 2; j < maxsearch; j++)
      {
         if (array[j] > array[correctMaxBin])
         {
             correctMaxBin = j;
         }
      }

      if (Math.Abs(correctMaxBin * 2 - hpsmax_index) < 4)
      {
          if (array[correctMaxBin] / array[(int)hpsmax_index] > 0.2)
          {
              hpsmax_index = correctMaxBin;
          }
      }

I tested the system using sawtooth waves and I noticed that the octave errors are still visible. 87.307 Hz to ~190 Hz it gives octave high errors. G5 (783.991) upwards sometimes it shows an octave lower.

Here are some of the results: Input | Result | Error

    F2 (87.307) - F4 (349.228) - 2 octaves higher
    G2 (97.999)- G4 (391.995) - 2 octaves higher
    A2 (110) - A3 (220) - an octave higher
    D3 (146.832) - D4 (mostly) (293.665) and D3 - an octave higher
    A3 (220) - A3 - Correct
    A4 (440) - A4 - Correct
    G5 (783.991) - G5 (mostly) and G4 (391.995) - an octave lower
    A5 (880) - A5 - Correct
    C6 (1046.502) - C6 - Correct

Please help me to fix this, because this affects so badly to the system's final feedback to the user.

Giggity
  • 54
  • 1
  • 7

1 Answers1

1

When I detected pitch and octave from polyphonic signals on MP3 recordings, I used a little different approach. In order to identify the harmonics that comprise a 'pitch', I chose to use a modified DFT which was logarithmically spaced, rather than a FFT.

I also decided on using a Two Stage Algorithm to detect pitch, which determined Octave (and the implied Fundamental Frequency) later in the second stage. The algorithm works like this:

a) First the ScalePitch of the dominant note is detected -- 'ScalePitch' has 12 possible pitch values: { E, F, F#, G, G#, A, A#, B, C, C#, D, D# }. And after ScalePitch and Time-Width of a note is determined,

b) then the Octave (fundamental) of that note is calculated by examining ALL the harmonics of 4 possible Octave-Candidate notes.

Octave Detection can be very tricky, especially on a polyphonic signal where the fundamental harmonic and/or other harmonics are missing. But my algorithm will work, even if some harmonics are missing. You might want to compile and step through my Windows code for PitchScope Player on GitHub to see how I determine octave.

You would want to focus on the function FundCandidCalcer::Calc_Best_Octave_Candidate() within the file FundCandidCalcer.cpp to see the Octave Detection algorithm in C++.

https://github.com/CreativeDetectors/PitchScope_Player

https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection

The diagram below demonstrates the Octave Detection algorithm which I developed to pick the correct Octave-Candidate note (that is, the correct Fundamental), once the ScalePitch and harmonics for that note have been determined.

enter image description here