I'm working on a very simplistic speech recognition project. I currently have 2 sets of wav files. Each set has 1-second long recordings of a set of words spoken by the same person at 2 different instances. For example, one set has the words "one", "two", and "three", and the other set has the same exact words obtained through a separate recording. Many of the words rhyme and use somewhat different sounds.
I've tried several things thus far, but the most practical thing I've gotten thus far is spectrograms (all constructed the same way using the same script) for each sound file.
This has all been done through MATLAB and I may only use MATLAB.
I will refer to one set of recordings/spectrograms as the "sample set", and that will the set from which I will provide the sample sound. I will refer to the other set of recordings/spectrograms as the "test set", and that will be the set from which I will try to find the best match to the provided sample recording/spectrogram.
What I would like is, when provided with a sample sound/spectrogram, MATLAB will return the best match or matches from the test set. Ideally, it will return the same word, but realistically I will be very happy with just some of the samples returning similar results (e.g. words that rhyme or have similar vowels/consonants).
What are some approaches I could try? Again, it is fine if this fails as long as the process is reasonable. I understand I have a very small sample size of sounds. I also understand it would be best to compare the sounds in the frequency domain, but all I have as of right now are spectrograms.