0

Sorry if this is a really obvious question. I am using matplotlib to generate some spectrograms for use as training data in a machine learning model. The spectrograms are of short clips of music and I want to simulate speeding up or slowing down the song by a random amount to create variations in the data. I have shown my code below for generating each spectrogram. I have temporarily modified it to produce 2 images starting at the same point in the song, one with variation and one without, in order to compare them and see if it is working as intended.

from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np

BPM_VARIATION_AMOUNT = 0.2
FRAME_RATE = 22050
CHUNK_SIZE = 2
BUFFER = FRAME_RATE * 5

def generate_random_specgram(track):
    # Read audio data from file
    audio = AudioSegment.from_file(track.location)
    audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
    samples = audio.get_array_of_samples()
    start = np.random.randint(BUFFER, len(samples) - BUFFER)
    chunk = samples[start:start + int(CHUNK_SIZE * FRAME_RATE)]

    # Plot specgram and save to file
    filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, track.bpm))
    plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
    plt.axis('off')
    plt.specgram(chunk, Fs = FRAME_RATE)
    plt.savefig(filename)
    plt.close()

    # Perform random variations to the BPM
    frame_rate = FRAME_RATE
    bpm = track.bpm
    variation = 1 - BPM_VARIATION_AMOUNT + (
        np.random.random() * BPM_VARIATION_AMOUNT * 2)
    bpm *= variation
    bpm = round(bpm, 2)
    # I thought this next line should have been /= but that stretched the wrong way?
    frame_rate *= (bpm / track.bpm) 

    # Read audio data from file
    chunk = samples[start:start + int(CHUNK_SIZE * frame_rate)]

    # Plot specgram and save to file
    filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, bpm))
    plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
    plt.axis('off')
    plt.specgram(chunk, Fs = frame_rate)
    plt.savefig(filename)
    plt.close()

I thought by changing the Fs parameter given to the specgram function this would stretch the data along the x-axis but instead it seems to be resizing the whole graph and introducing white space at the top of the image in strange and unpredictable ways. I'm sure I'm missing something but I can't see what it is. Below is an image to illustrate what I'm getting.

Spectrogram Examples

cainy393
  • 422
  • 1
  • 4
  • 16

2 Answers2

1

The framerate is a fixed number that only depends on your data, if you change it you will effectively "stretch" the x-axis but in the wrong way. For example, if you have 1000 data points that correspond to 1 second, your framerate (or better sampling frequency) will be 1000. If your signal is a simple 200Hz sine which slightly increases the frequency in time, the specgram will be:

t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)

frame_rate = 1000
plt.specgram(signal, Fs=frame_rate);

enter image description here

If you change the framerate you will have a wrong x and y-axis scale. If you set the framerate to be 500 you will have:

t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)

frame_rate = 500
plt.specgram(signal, Fs=frame_rate);

enter image description here

The plot is very similar, but this time is wrong: you have almost 2 seconds on the x-axis, while you should only have 1, moreover, the starting frequency you read is 100Hz instead of 200Hz.


To conclude, the sampling frequency you set needs to be the correct one. If you want to stretch the plot you can use something like plt.xlim(0.2, 0.4). If you want to avoid the white band on top of the plot you can manually set the ylim to be half the frame rate:

plt.ylim(0, frame_rate/2)

This works because of simple properties of the Fourier transform and Nyquist-Shannon theorem.

Andrea
  • 2,932
  • 11
  • 23
  • Yeah the reason I was changing the frame rate was to artificially alter the speed of the music. By increasing the frame rate 10% that would be the same as speeding the music up by 10%. The solution was indeed to use xlim and ylim to correctly set the axis though! Why is the ylim half the framerate though? I came to this conclusion but had no idea why this was correct. – cainy393 Jan 07 '20 at 17:28
  • I think I understand from reading the wikipedia article you linked but the maths involved is slightly over my head so I'm not 100% sure of it. – cainy393 Jan 07 '20 at 20:12
  • 1
    Yes, it is indeed because of the wiki article I linked. The main message is that the maximum frequency you can get from a Fourier transform is half the sample rate. If you go higher you will just find the same plot repeating with no additional information – Andrea Jan 07 '20 at 20:36
1

The solution to my problem was to set the xlim and ylim of the plot. Here is the code from my testing file in which I finally got rid of all the odd whitespace:

from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt

BUFFER = 5
FRAME_RATE = 22050
SAMPLE_LENGTH = 2

def plot(audio_file, bpm, variation=1):
    audio = AudioSegment.from_file(audio_file)
    audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
    samples = audio.get_array_of_samples()
    chunk_length = int(FRAME_RATE * SAMPLE_LENGTH * variation)
    start = np.random.randint(
        BUFFER * FRAME_RATE,
        len(samples) - (BUFFER * FRAME_RATE) - chunk_length)
    chunk = samples[start:start + chunk_length]

    plt.figure(figsize=(5.12, 2.56)).add_axes([0, 0, 1, 1])
    plt.specgram(chunk, Fs=FRAME_RATE * variation)
    plt.xlim(0, SAMPLE_LENGTH)
    plt.ylim(0, FRAME_RATE / 2 * variation)
    plt.savefig('specgram-%f.png' % (bpm * variation))
    plt.close()
cainy393
  • 422
  • 1
  • 4
  • 16