Sorry if this is a really obvious question. I am using matplotlib to generate some spectrograms for use as training data in a machine learning model. The spectrograms are of short clips of music and I want to simulate speeding up or slowing down the song by a random amount to create variations in the data. I have shown my code below for generating each spectrogram. I have temporarily modified it to produce 2 images starting at the same point in the song, one with variation and one without, in order to compare them and see if it is working as intended.
from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np
BPM_VARIATION_AMOUNT = 0.2
FRAME_RATE = 22050
CHUNK_SIZE = 2
BUFFER = FRAME_RATE * 5
def generate_random_specgram(track):
# Read audio data from file
audio = AudioSegment.from_file(track.location)
audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
samples = audio.get_array_of_samples()
start = np.random.randint(BUFFER, len(samples) - BUFFER)
chunk = samples[start:start + int(CHUNK_SIZE * FRAME_RATE)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, track.bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = FRAME_RATE)
plt.savefig(filename)
plt.close()
# Perform random variations to the BPM
frame_rate = FRAME_RATE
bpm = track.bpm
variation = 1 - BPM_VARIATION_AMOUNT + (
np.random.random() * BPM_VARIATION_AMOUNT * 2)
bpm *= variation
bpm = round(bpm, 2)
# I thought this next line should have been /= but that stretched the wrong way?
frame_rate *= (bpm / track.bpm)
# Read audio data from file
chunk = samples[start:start + int(CHUNK_SIZE * frame_rate)]
# Plot specgram and save to file
filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, bpm))
plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
plt.axis('off')
plt.specgram(chunk, Fs = frame_rate)
plt.savefig(filename)
plt.close()
I thought by changing the Fs parameter given to the specgram function this would stretch the data along the x-axis but instead it seems to be resizing the whole graph and introducing white space at the top of the image in strange and unpredictable ways. I'm sure I'm missing something but I can't see what it is. Below is an image to illustrate what I'm getting.