extracting pitch features from audio file

Question

I am trying to extract pitch features from an audio file which I would use for a classification problem. I am using python(scipy/numpy) for classification.

I think I can get frequency features using scipy.fft but I don't know how to approximate musical notes using frequencies. I researched a bit and found that I need to get chroma features which map frequencies to 12 bins for notes of a chromatic scale.

I think there's a chroma toolbox for matlab but I don't think there's anything similiar for python.

How should I go forward with this? Could anyone also suggest reading material I should look into?

score 4 · Answer 1 · edited Sep 15 '16 at 00:47

4

You can map frequencies to musical notes:

n=12*log_2(f/Cp)+69

with being the midi note number to be calculated, the frequency and the chamber pitch (in modern music 440.0 Hz is common).

As you may know a single frequency doesn't make a musical pitch. "Pitch" arises from the sensation of the fundamental of harmonic sounds, i.e. sounds that mainly consist of integer multiples of one single frequency (= the fundamental).

If you want to have Chroma Features in Python, you can use the Bregman Audio-Visual Information Toolbox. Note that chroma features don't give you information about the octave of a pitch, so you just get information about the pitch class.

from bregman.suite import Chromagram
audio_file = "mono_file.wav"
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
F.X # all chroma features
F.X[:,0] # one feature

The general problem of extracting pitch information from audio is called pitch detection.

edited Sep 15 '16 at 00:47

dermen

5,252
4
23
34

answered Dec 23 '13 at 17:21

Frank Zalkow

3,850
1
22
23

1

Thanks a lot... Could you also recommend reading material or books on pitch detection or application of dsp to music in general? – Ada Xu Dec 24 '13 at 13:45
2

As an general introduction to a wide range of computer music issues C. Roads _The Computer Music Tutorial_ (1994, Cambridge: MIT Press) is a very accessible and comprehensive (>1000 pages) reference book. For me the 1st part of M. Müllers _Information Retrieval for Music and Motion_ (2007, Berlin, Heidelberg: Springer) was great (less comprehensive, more up-to-date, more technical). If you are interested in a particular topic, the [procceedings of ISMIR](http://www.ismir.net/proceedings/) are a rich seam of information. Others may give you other (and better?) references. I'd be interested too. – Frank Zalkow Dec 24 '13 at 21:52
Pitch IS the fundamental frequency. The harmonics comprise the timbre (pronounced tamber). For example, a flute and a violin can play the same pitch (fundamental frequency), but their timbre is the harmonic frequency characteristics that make them sound different. – Wyrmwood Dec 25 '13 at 18:58
I think, pitch and timbre are no "physical-acoustical" facts, but rather psychoacoustical effects. That's why I wanted stress, that "pitch" arises from sensation of the fundamental and it's not the fundamental itself. Would you agree with that? – Frank Zalkow Dec 26 '13 at 11:10
I have to agree with Frank Zalkow here. Non-harmonic/non-periodic sounds, even modulated noise bursts, can have perceived pitch, so the fundamental frequency is clearly not everything. – Alex I Dec 31 '13 at 02:33

Alex I · Answer 2 · 2013-12-31T02:59:17.343

You can try reading the literature on pitch detection, which is quite extensive. Generally autocorrelation-based methods seem to work pretty well; frequency-domain or zero-crossing methods are less robust (so FFT doesn't really help much). A good starting point may be to implement one of these two algorithms:

YAAPT, from: Stephen A. Zahorian and Hongbing Hu, "A spectral-temporal method for robust fundamental frequency tracking", J. Acoust. Soc. Am. 123, 4559 (2008). http://bingweb.binghamton.edu/~hhu1/paper/Zahorian2008spectral.pdf and MATLAB code here: http://ws2.binghamton.edu/zahorian/yaapt.htm
YIN, from: De Cheveigné, A., Kawahara, H. "YIN, a fundamental frequency estimator for speech and music", J. Acoust. Soc. Am. 111, 1917-1930 (2002). http://audition.ens.fr/adc/pdf/2002_JASA_YIN.pdf

As far as off-the-shelf solutions, check out Aubio, C code with python wrapper, several pitch-extraction algorithms available including YIN and multiple-comb.

Thanks a lot :) About aubio, I am finding implementing examples on this page http://aubio.org/doc/latest/examples.html a little difficult. I can't find the methods they've used in their examples in the library and there isn't enough documentation. — Ada Xu, Jan 01 '14 at 10:46

Leftium · Answer 3 · 2014-01-01T06:58:15.273

If you're willing to use 3rd party libraries (at least as a reference for how other people accomplished this):

Extracting musical information from sound, a presentation from PyCon 2012, shows how to use the AudioNest Python API:

Here is the relevant EchoNest documentation:

Track API Methods
Detailed Analyze Documentation

Relevant excerpt:

pitch content is given by a “chroma” vector, corresponding to the 12 pitch classes C, C#, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7). Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0.

EchoNest does the analysis on their servers. They provide free API keys for non-commercial use.

If EchoNest is not an option, I would look at the open-source aubio project. It has python bindings, and you can examine the source to see how they accomplished pitch detection.

extracting pitch features from audio file

3 Answers3

Linked