How do i create and train tensorflow model with audio inputs?

Question

I've audio files say "left.wav", "right.wav" and so forth, I want to create a model which takes audio as input and output label "left" or "right" etc.

Question

How do I feed my raw audio to my neural network ?

Peter Szoldan · Accepted Answer · 2018-04-17T00:50:55.787

0

The scipy.io.wavfile.read() function will return the sample rate and the whole audio in a numpy array.

You can then feed that to your network.

import scipy
rate, numpy_audio = scipy.io.wavfile.read( "left.wav" )

If you want to do speech recognition, check out DeepSpeech, it's a large project, but you can probably get some good ideas there.

For a simpler intro, Tensorflow has a Simple Audio Recognition tutorial.

To generate audio, you might want to consider WaveNet - this is one particular implementation.

edited Apr 17 '18 at 00:50

answered Apr 16 '18 at 21:08

Peter Szoldan

4,792
1
14
24

That's it?Nothing more? – martian1231 Apr 17 '18 at 00:33
Is it too easy? :) – Peter Szoldan Apr 17 '18 at 00:33
Are you trying to do speech recognition? – Peter Szoldan Apr 17 '18 at 00:36
Added some links for you in my answer – Peter Szoldan Apr 17 '18 at 00:51
hi, Thankyou so much. Yes, I'm just trying to create a basic speech recognition model using tensorflow. :) – martian1231 Apr 17 '18 at 01:11

How do i create and train tensorflow model with audio inputs?

1 Answers1