How to speed up spectrogram computation with tensorflow?

Question

I want to compute spectrogram of 1-second audio clip for each frame in a video file.

I use tensorflow.contrib.framework.python.ops.audio_ops.audio_spectrogram function to compute the spectrogram.

The audio extracted from the video and sampled at 48 kHz. I am using window_size=480 (0.01 * sample_rate) and stride=240 (0.5 overlap). All my video file has 25 fps and has a duration of 1~10 minutes. So it needs to compute 25 spectrograms for each second.

I am currently computing the spectrogram by getting the waveform at time x to x+1 seconds and pass the waveform to the audio_spectrogram() function. This is the snippets code of how I compute the spectrogram of an audio file:

audio_binary = tf.read_file(filename)
wav = audio_ops.decode_wav(audio_binary)

with tf.Session() as sess:
    waveform, sample_rate = tf.run(wav)

for i in range(25 * video_duration):
    start = i * sample_rate / 25 # fps
    spect = audio_ops.audio_spectrogram(waveform[start:start+sample_rate], 480, 240)

    # spectrogram post processing...

    with tf.Session() as sess, open(get_output_filename(filename, i)) as output:
        encode = tf.image.encode_jpeg(spect)
        output.write(tf.run(encode))

Unfortunately, this code takes a really long time to compute all the spectrograms. It needs 12 hours to fully compute the spectrograms of 5 audio files. And I have hundreds of videos to compute :(.

Is there any way to speed up this process?

I am thinking of executing the audio_spectrogram() function in batch (kind of operating on [batch_size, waveform]) but don't know how to do it cause the waveform argument only takes an array of 1 dim. Also, I don't really sure if doing the operation in a batch will speed up the process.

I dont think the video format matters. I just need the audio part (which extracted into `.wav` file) and video frame rate (to calculate start index of audio sample) to compute the spectrogram. — Ronald Sumbayak, Mar 29 '19 at 14:49
If your approach uses only 25% CPU, you can run 4 programs in parallel, each on a single video. Quite cheap version of multiprocessing. — Thomas Weller, Mar 29 '19 at 16:44
Should it be done using TensorFlow? Is CUDA enabled/used? you can parallelize the for loop as well — gustavovelascoh, Mar 29 '19 at 16:50
@gustavovelascoh Actually, no, at least for now. I am using tensorflow because the spectrogram was previously computed before the augmentation process, which also use tensorflow API, so that it can be processed in the same graph and utilize the GPU more (yes, CUDA is being used). But now the spectrogram computation has been moved out of the augmentation process. I am also looking at some other libraries that might be able to replace it, but I still have hope for tensorflow. — Ronald Sumbayak, Mar 29 '19 at 18:46
@ThomasWeller Is that the only possible solution? Even if I scale the program by number it still take days (or maybe weeks?) to process all the videos. Is there a way to speed up the computation on tensorflow level? — Ronald Sumbayak, Mar 29 '19 at 19:06

score 0 · Answer 1 · answered Jan 16 '20 at 07:45

check out this project. It is an implementation of speech processing features inside Tensorflow Keras layers. The package helps to speed up audio processing by utilizing the GPU and at the same time allowing easy conversion to TFLite.

import tensorflow as tf
from spela.spectrogram import Spectrogram

# Define a Sequential model
model = tf.keras.Sequential()

# Add a layer to compute Spectrogram, returns a 2D image
model.add(Spectrogram(n_dft=512, n_hop=256, input_shape=(height, width),
                      return_decibel_spectrogram=True, power_spectrogram=2.0,
                      trainable_kernel=False, name='static_stft'))

model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
              loss="categorical_crossentropy"
              , metrics=[tf.keras.metrics.categorical_accuracy])

How to speed up spectrogram computation with tensorflow?

1 Answers1