0

I've audio files say "left.wav", "right.wav" and so forth, I want to create a model which takes audio as input and output label "left" or "right" etc.

Question

How do I feed my raw audio to my neural network ?

Kzryzstof
  • 7,688
  • 10
  • 61
  • 108
martian1231
  • 15
  • 1
  • 4

1 Answers1

0

The scipy.io.wavfile.read() function will return the sample rate and the whole audio in a numpy array.

You can then feed that to your network.

import scipy
rate, numpy_audio = scipy.io.wavfile.read( "left.wav" )

If you want to do speech recognition, check out DeepSpeech, it's a large project, but you can probably get some good ideas there.

For a simpler intro, Tensorflow has a Simple Audio Recognition tutorial.

To generate audio, you might want to consider WaveNet - this is one particular implementation.

Peter Szoldan
  • 4,792
  • 1
  • 14
  • 24