Is it possible compare 'the shape' of audio(voice) signal?

Question

Here is an example:

Record very low-tone voice guy saying, "Nice to meet you"
Record very high-tone voice girl saying, "Nice to meet you" (they are saying almost same speed, it's like the second girl trying to mimic first one's voice)

What I'd like to do is to verify whether they are sounds similar sentences (not using voice recognition)

Is it possible just with FFT?

But man and woman have different tones, where woman usually has high frequency voice, so I think it would be hard just with FFT because FFT is based on frequency domain.

Thanks.

First step: compare the spectrograms of the two clips. This involves breaking a 1D audio signal into 2D (time & frequency). See this example using Matlab (includes full code in links): http://stackoverflow.com/a/38386589/500207. Do you have example data? I'd love to see what their spectrograms look like and see if you can reliably tell whether the two clips involve the same phrases. My guess: it’ll be very difficult to discriminate semantic content by examining just frequency content—just a guess. — Ahmed Fasih, Jul 20 '16 at 02:35
Also, this may be a good question for http://dsp.stackexchange.com, the digital signal processing-related version of StackOverflow. — Ahmed Fasih, Jul 20 '16 at 02:36

Is it possible compare 'the shape' of audio(voice) signal?

0 Answers0