3

I've done a lot of Google searching but haven't been able to find an example on how to determine the musical note of mp3 files.

So far, I've read something about FFT (Fast Fourier Transform) from which the pitch of an audio file can be calculated and based on the pitch notation the musical note can be derived.

But then I read that the mp3 file format is in the time domain which due to the lossy compressed format doesn't contain the values of the samples necessary for frequency analysis... does that mean that you have to convert the mp3 to a wav file in order to the calculate the key?

I've found a couple of examples of real-time pitch detection for visual purpose but not for analysing an entire mp3 file and outputting the musical key.

I hope someone can point me in the right direction.

Thanks.

Community
  • 1
  • 1
Ace
  • 223
  • 4
  • 14
  • 1
    "the mp3 file format is in the time domain" - well, not quite. It is a coded (data compressed) version of an uncompressed file e.g. WAV PCM, which in turn is a representation of a time domain signal. – BrechtDeMan Sep 09 '16 at 12:17
  • 1
    MP3 is a lossy format that alters and filters frequencies. You can't restore what isn't there anymore. But reading the information you provided you can see that indeed a conversion should/could help because the FFT works on the "raw" data. I just don't know how this relates to JavaScript? Especially on the Client I wouldn't be too sure you're even able to read that kind of data. – Seth Sep 09 '16 at 12:18
  • 2
    This is a very complicated problem that many researchers are still working on, and there's no simple one-size-fits-all solution. Forget about MP3 vs WAV though, that is not the issue. You need to get the signal, then do many complicated things with it to get an estimation of the key. – BrechtDeMan Sep 09 '16 at 12:19
  • Okay, but isn't it possible to determine the pitch notation based on the amplitude in the time domain? – Ace Sep 09 '16 at 12:39
  • Here’s a related question about real-time pitch detection (in C#) and my Python implementation of a handful of pitch estimators (harmonic product spectrum, Welch spectrogram, Blackman-Tukey spectral estimator): https://gist.github.com/fasiha/957035272009eb1c9eb370936a6af2eb Your broader question of musical key is one that escapes my very limited understanding of music—can you explain, if you had a sequence of pitches (in Hertz), how would you get musical key out of that? – Ahmed Fasih Sep 09 '16 at 13:33
  • The lossy-vs-lossless issue is, as @BDM states, most likely a non-issue. MP3-decoded audio should have more than enough frequency content for pretty much any kind of frequency-domain pitch estimation. – Ahmed Fasih Sep 09 '16 at 13:34
  • "how would you get musical key out of that?" - well, I would probably make an array based on the [table of note frequencies](https://en.wikipedia.org/wiki/Scientific_pitch_notation). I'm thinking that [this guy was on to something](http://stackoverflow.com/questions/34937030/calculating-the-average-amplitude-of-an-audio-file-using-fft-in-javascript) but not quite sure how to go about the scripting of it. – Ace Sep 09 '16 at 13:57
  • OHH, I got you, I thought you meant “musical key” as in, “piano concerto in C minor” key, not the key of a single note. So what I’d do is the same as I demonstrated in https://gist.github.com/fasiha/957035272009eb1c9eb370936a6af2eb — take an audio clip, break it down into overlapping chunks, and for each chunk run HPS or Blackman-Tukey and extract fundamental frequency, make an array of frequencies, convert them to keys (C, C♯, D…), and finally somehow deal with the fact that the same note will last over several chunks (maybe display them all or shrink runs of the same note into a single one). – Ahmed Fasih Sep 09 '16 at 14:27
  • (The question you linked to, about average amplitudes, doesn’t seem related to this at all.) – Ahmed Fasih Sep 09 '16 at 14:27
  • How have some audio (mp3) that I could try this on? – Ahmed Fasih Sep 09 '16 at 14:28
  • Sorry, perhaps I should try to describe what I'm planning to accomplish :) Basically, I want to make a JavaScript version of DJ tool software like ["Mixed In Key" or "KeyFinder"](https://www.youtube.com/watch?v=JqnD9h2Hm7w). In this browser version you should be able to upload a number of mp3 files and get them analysed for both key and bpm. The bpm part I've figured out so it's the key (Camelot Notation) I'm struggling with. Okay, but it is the average/dominant key of a track I want the analysis to end up displaying. So thought perhaps the average frequency amplitude was the way to go. – Ace Sep 09 '16 at 16:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/123012/discussion-between-ace-and-ahmed-fasih). – Ace Sep 09 '16 at 16:34

1 Answers1

6

I created an application, PitchScope Player, which can do pitch detection upon MP3 files in realtime and its complete source code is posted on GitHub, however it is written in C++. Pitch detection and musical key detection, especially in realtime, is extremely demanding and probably needs the speed of C++ to be executed at this point in time. You have just begun to explore a very difficult audio engineering task, and really need to first get some background as to the physics of how we perceive ‘pitch’, what a ‘harmonic’ is, and explore the choices in how to make a frequency-domain transform from the raw signal (see Wikipedia link below).

When a single key is pressed upon a piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). We detect pitch by finding for groups of harmonics which have that mathematical relationship in the spacing of their frequencies.

Rather than use a FFT, I use a modified Logarithmic DFT Transform so that its frequency channels can be aligned to where the harmonics are located within a musical signal. The Logarithmic DFT transform also gives a distinct speed advantage in execution.

Once you have detected numerous pitches in the musical signal, then you can detect the Musical Key by scoring the 12 different Key Candidates by the populations of member notes within that musical signal. Another application of mine, PitchScope Navigator, can also detect Musical Key in realtime.

You might want to acquire a C++ compiler and recompile my source code so you can step through its execution to see how my algorithms work. It will also decode an MP3 file. You could also download an executable of that application, PitchScope Player, from numerous places on the web in order to see how it performs on a Windows machine with a MP3 file of your choice.

https://github.com/CreativeDetectors/PitchScope_Player

https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection

Below is the image of a Logarithmic DFT (created by my C++ software) for 3 seconds of a guitar solo on a polyphonic mp3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width. enter image description here