I created an application, PitchScope Player, which can do pitch detection upon MP3 files in realtime and its complete source code is posted on GitHub, however it is written in C++. Pitch detection and musical key detection, especially in realtime, is extremely demanding and probably needs the speed of C++ to be executed at this point in time. You have just begun to explore a very difficult audio engineering task, and really need to first get some background as to the physics of how we perceive ‘pitch’, what a ‘harmonic’ is, and explore the choices in how to make a frequency-domain transform from the raw signal (see Wikipedia link below).
When a single key is pressed upon a piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). We detect pitch by finding for groups of harmonics which have that mathematical relationship in the spacing of their frequencies.
Rather than use a FFT, I use a modified Logarithmic DFT Transform so that its frequency channels can be aligned to where the harmonics are located within a musical signal. The Logarithmic DFT transform also gives a distinct speed advantage in execution.
Once you have detected numerous pitches in the musical signal, then you can detect the Musical Key by scoring the 12 different Key Candidates by the populations of member notes within that musical signal. Another application of mine, PitchScope Navigator, can also detect Musical Key in realtime.
You might want to acquire a C++ compiler and recompile my source code so you can step through its execution to see how my algorithms work. It will also decode an MP3 file. You could also download an executable of that application, PitchScope Player, from numerous places on the web in order to see how it performs on a Windows machine with a MP3 file of your choice.
https://github.com/CreativeDetectors/PitchScope_Player
https://en.wikipedia.org/wiki/Transcription_(music)#Pitch_detection
Below is the image of a Logarithmic DFT (created by my C++ software) for 3 seconds of a guitar solo on a polyphonic mp3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width.
