Is comparing two audio files using FFT the only way?

Question

I am new to signal processing and trying to compare two audio files using FFT. Reading the file to bytes and then converting to complex numbers and then sending it to fft. Then calculated magnitude of the complex numbers(output from fft). Trying to compare the magnitude but they are not matching.

Please let me know if i am missing anything.

Is there any other way to compare two audio files?

Are the two audio files the same? How "different" are they? What constitutes "equal" for your application? — Jim Garrison, Oct 23 '13 at 23:35
they can be mp3 or wav . both the files comparing will be of same format. — user2913531, Oct 24 '13 at 00:30
I have to determine if one audio file is derived from other audio file — user2913531, Oct 24 '13 at 00:31
Again, what do you mean by "derived"? Do you have any hard criteria for reducing "derived" to an algorithm that can be implemented? For instance, what if the two files are the same source but one has been sped up by, say, 10%. What if they are the same but at different pitch? This is an extremely hard problem. — Jim Garrison, Oct 24 '13 at 04:49
I have a wav file(40 sec) and the second file is extracted from first file (10 sec). I want to compare these files and determine that they are same. The second file can be sped up by 10% or of different pitch. So byte comparison is not working. I want to know other approaches to this problem. Thanks — user2913531, Oct 24 '13 at 23:28

score 1 · Answer 1 · answered Oct 29 '13 at 16:24

In general, the FFTs for the complete file will not be equal - consider a 40 sec. file that contains four 10 sec. segments of sine waves at 20Hz, 40Hz, 60Hz and 80Hz, respectively.

The corresponding spectrum for the whole file would show peaks at those four frequencies, but any 10 sec. excerpt would have two of them at most. Hence, they do not match.

Now, what you're trying to do sounds a bit like Shazam, and luckily, they've released a research paper on how it works. Maybe that will solve your problem.

For another approach (albeit one that might not be able to deal with pitch and speed changes), consider the implications of my example above: You shouldn't try to match a spectrogram that was created over 40 sec. to one that represents only 10 sec. So you'll have to find which 10 sec.-segment of the original file the second file is taken from.

To achieve this, you could use a simple sliding window (start with the data from seconds 1 through 10, then 2 through 11, and so on), or you could chop the second file into even smaller chunks and combine the initial sliding window with techniques from string searching.

Is comparing two audio files using FFT the only way?

1 Answers1