How to train CNN on common voice dataset

Question

I am trying to train a cnn with the common voice dataset. I am new to speech recognition and am not able to find any links on how to use the dataset with keras. I followed this article to build a simple word classification network. But I want to scale it up with the common voice dataset. any help is appreciated.

Thank you

What is the end goal that you want to achieve? Speech recognition? or what are your labels? — Edward Aung, Aug 01 '19 at 05:18
The server for the blog article you linked to seems to be down. That makes it impossible to comment in a meaningful way. I'd like to suggest "smaller", answerable questions about concrete problems instead of "how do I do ". — Hendrik, Aug 01 '19 at 06:44
sorry about the link. for some reason the link opens properly through the medium app in android, but fails to open though the browser. — Sashaank, Aug 01 '19 at 09:00

Baptiste Pouthier · Accepted Answer · 2019-08-01T09:06:33.183

6

What you can do is looking at MFCCs. In short, these are features extracted from the audio waveform by using signal processing techniques to transcribe the way humans perceive sound. In python, you can use python-speech-features to compute MFCCs.

Once you have prepared your data, you can build a CNN; for example something like this one:

You can also use RNNs (LSTM or GRU for example), but this is a bit more advanced.

EDIT: A very good dataset to start, if you want:

Speech Commands Dataset

edited Aug 01 '19 at 09:06

answered Aug 01 '19 at 07:23

Baptiste Pouthier

573
3
22

Thanks for the reply. I will certainly do that. Just a follow on doubt I have is, the common voice dataset is a compilation of sentences spoken by various people. For speech recognition should I convert there sentences to words? – Sashaank Aug 01 '19 at 08:58
It's much easier to work with labeled words than with sentences; you can work with sentences using RNN + CTC-loss for example, but its very advanced. You may practice with words before! If you want a dataset with already prepared words, you can take a look to the google speech commands dataset (i'll put the link in my answer). This is a very good dataset to start. – Baptiste Pouthier Aug 01 '19 at 09:05

How to train CNN on common voice dataset

1 Answers1