I am currently working on training a classifier with PyTorch and torchaudio. For this purpose I followed the following tutorial: https://towardsdatascience.com/audio-deep-learning-made-simple-sound-classification-step-by-step-cebc936bbe5
This all works like a charm and my classifier is now able to successfully classify .wav files. However I would like to turn this into a real-time classifier, that is able to also classify recordings from a microphone/loopback input.
For this I would hope to not have to save a recording into a .wav file to load it again but instead directly feed the classifier with an in memory recording.
The tutorial uses the .load function of torchaudio to load a .wav file and return a waveform and sample rate as follows:
sig, sr = torchaudio.load(audio_file)
Now loopback is pretty much required and since pyaudio does apparently not support loopback devices yet (except for a fork that is very likely to be outdated) I stumbled across soundcard: https://soundcard.readthedocs.io/en/latest/
I found this code to yield a recording of my speaker loopback:
speakers = sc.all_speakers()
# get the current default speaker on your system:
default_speaker = sc.default_speaker()
# get a list of all microphones:v
mics = sc.all_microphones(include_loopback=True)
# get the current default microphone on your system:
default_mic = mics[0]
with default_mic.recorder(samplerate=148000) as mic, \
default_speaker.player(samplerate=148000) as sp:
print("Recording...")
data = mic.record(numframes=1000000)
print("Done...Stop your sound so you can hear playback")
time.sleep(5)
sp.play(data)
However now of course I don't want to play that audio with the .play function but instead pass it onto to torchaudio/the classifier. Since I am new to the world of audio processing I have no idea how to get this data into a suitable format similar to the one returned by torchaudio. According to the docs of soundcard the data has the following format:
The data will be returned as a frames × channels float32 numpy array
As a last resort maybe saving it into an in memory .wav file and then reading it with torchaudio is possible? Any help is appreciated. Thank you in advance!