I want to compute spectrogram of 1-second audio clip for each frame in a video file.
I use tensorflow.contrib.framework.python.ops.audio_ops.audio_spectrogram
function to compute the spectrogram.
The audio extracted from the video and sampled at 48 kHz. I am using window_size=480
(0.01 * sample_rate) and stride=240
(0.5 overlap). All my video file has 25 fps and has a duration of 1~10 minutes. So it needs to compute 25 spectrograms for each second.
I am currently computing the spectrogram by getting the waveform at time x to x+1 seconds and pass the waveform to the audio_spectrogram()
function. This is the snippets code of how I compute the spectrogram of an audio file:
audio_binary = tf.read_file(filename)
wav = audio_ops.decode_wav(audio_binary)
with tf.Session() as sess:
waveform, sample_rate = tf.run(wav)
for i in range(25 * video_duration):
start = i * sample_rate / 25 # fps
spect = audio_ops.audio_spectrogram(waveform[start:start+sample_rate], 480, 240)
# spectrogram post processing...
with tf.Session() as sess, open(get_output_filename(filename, i)) as output:
encode = tf.image.encode_jpeg(spect)
output.write(tf.run(encode))
Unfortunately, this code takes a really long time to compute all the spectrograms. It needs 12 hours to fully compute the spectrograms of 5 audio files. And I have hundreds of videos to compute :(.
Is there any way to speed up this process?
I am thinking of executing the audio_spectrogram()
function in batch (kind of operating on [batch_size, waveform]
) but don't know how to do it cause the waveform argument only takes an array of 1 dim. Also, I don't really sure if doing the operation in a batch will speed up the process.