10

I am trying to find a way to compare the likeness of short 500 millisecond recordings using MATLAB of the same note played on different instruments.

Going into detail on this specific topic: I am a music student that has been given the task to objectively determine the tone of various modern low brass instruments to determine what instrument should replace the obsolete "ophicleide" or Bass keyed bugle. I first used a visual comparison of a spectrograph of it and 6 other instruments, but that approach was too subjective.

I recorded all of the instruments with the same microphone, equipment, gain levels, and the same notes. For this reason, I believe that the signals are similar enough to use MATLAB tools.

I believe that comparing the fft is going to be the most accurate calculation. I tried at first a freq-domain correlation, and tested different segments of the same tone (eu, and eu2 being variables)

>> corr(abs(fft(eu)),abs(fft(eu2)))
ans = 0.9963

Which is a step in the right direction, but I seem to get the opposite result when I compare different signals: (euphonium and ophicleide sound almost identical)

>> corr(abs(fft(eu)),abs(fft(ophi)))  
ans =   0.5242

euphonium and bass clarinet sound completely different, but this shows higher correlation

>> corr(abs(fft(eu)),abs(fft(basscl)))   
ans = 0.8506

I tried a normalized maximum cross-correlation magnitude formula that I found online, but I am getting the same results

>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x =eu2; y = eu; norm_max_xcorr_mag(x,y)
ans =   0.9638

I get a similar result when comparing the other samples

 >> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x = eu; y = basscl; 
ans = 0.6825

compared to

>> norm_max_xcorr_mag = @(x,y)(max(abs(xcorr(x,y)))/(norm(x,2)*norm(y,2))); x = eu; y = ophi; norm_max_xcorr_mag(x,y)
ans = 0.3519

The Euphonium and Bass Clarinet (basscl) have a completely different sound, and completely different harmonic series, but these formulas are showing closer correlation than the Euphonium and Ophicleide, whose frequency bands look almost like an identical match.

I am worried that these correlations are showing the correlation of true pitch (I am playing the same note on all of these instruments, but the Ophicleide might be out of tune by up to 1 Hz) It could also be accounting for phase, or even total amplitude.

does anyone know of a better clear cut method in comparing the proportions of the harmonic overtones of these complex waveforms?

or am I barking up the wrong tree?

TylerH
  • 20,799
  • 66
  • 75
  • 101
Euphman
  • 115
  • 6
  • 1
    Interesting problem! As you say, if the notes are out of tune I think that can distort the correlation figures. You could perhaps apply a range of pitch shifts to one of the signals (by resampling in time domain), from say -2 Hz to 2 Hz (for the paticular played note), and then choose the shift that gives highest correlation with the other audio file. That way you will be correcting for the possible lack of tuning – Luis Mendo Jun 01 '14 at 22:38
  • Did you solve the problem? I would be interested how you solved for the pitch? Did you use something like the z-chirp transform to get a zoomed fft spectrum to determine the pitch? – NoDataDumpNoContribution Jul 16 '14 at 09:38

2 Answers2

2

With respect to your specific question, the quantity you've computed is essentially the maximum value of the spectral coherence function. The problem is that the spectral coherence is only a good measure of the correlation between two signals if the signals are statistically stationary. That is, if the probability distribution of frequencies in the signals do not vary with time.

Unfortunately, musical instrument note signals are not likely to be stationary, because the very features most important in classifying the difference between how the same note "sounds" to the human ear on different instruments are due to harmonics and modulations that are more than likely time varying over the duration of the note.

So rather than using the spectral coherence, you need a frequency domain or time-frequency domain metric that better captures the similarity between the non-stationary parts of the note spectra.

At this point, it's less of a problem of which MATLAB functions to select (although a look at this example from the Signal Processing Toolbox documentation may help you get started, if you have that toolbox). It is more a question of researching signal processing and feature classification techniques. Here you really have to go to the literature on musical acoustics. Here is just one abstract link - I don't have access to the ACM but you may have access through your university if you are a student.

Good luck with what sounds like an interesting problem !

paisanco
  • 4,098
  • 6
  • 27
  • 33
  • you are correct in your observations , the attack, release, and vibrato are other distinguishing characteristics. Vibrato on brass instrument would not "supposedly" have been used in this time period, and this is created by modified by the player. an ophicleide has triple the attack time as a modern valve instrument (35 milliseconds or so) I am including this information in the chapter. The samples that I have are stationary, because I am only discussing "tone" or the spectrum of the instruments. But the article you linked to...!!! the Centroid Mean works so far!! i'll updated – Euphman Jun 08 '14 at 23:06
  • The centroid mean ordered the sounds by perceived "brightness" this was the best objective measure of tone quality I found. – Euphman Jun 10 '14 at 05:41
1

I'm not an expert in the subject, but I'm aware of a couple of audio features that can help in such problems: Linear Predictive Coding (LPC) and Mel-Frequency Cepstral Coefficients (MFCCs).

A quick search will reveal plenty of information. As an example I found this one and this one (didn't read them, but they looked relevant).

That should get you started. Depending on your interest, you can go really deep in this topic. For example, one thing is to compare the steady state of the notes played by different instruments, but my understanding is that the transient (attack) is extremely relevant perceptually.

Good luck!

jorgeh
  • 1,727
  • 20
  • 32