I've audio files say "left.wav", "right.wav" and so forth, I want to create a model which takes audio as input and output label "left" or "right" etc.
Question
How do I feed my raw audio to my neural network ?
I've audio files say "left.wav", "right.wav" and so forth, I want to create a model which takes audio as input and output label "left" or "right" etc.
Question
How do I feed my raw audio to my neural network ?
The scipy.io.wavfile.read()
function will return the sample rate and the whole audio in a numpy array.
You can then feed that to your network.
import scipy
rate, numpy_audio = scipy.io.wavfile.read( "left.wav" )
If you want to do speech recognition, check out DeepSpeech, it's a large project, but you can probably get some good ideas there.
For a simpler intro, Tensorflow has a Simple Audio Recognition tutorial.
To generate audio, you might want to consider WaveNet - this is one particular implementation.