I have tried a couple of different approaches to generating a grayscale PNG of a DTFT of a video file but my results don't look anything like what other people are posting. Instead of plotting to the screen with matplotlib (as all examples tend to do), I am trying to create a PNG with scikit-image (which I trust after using it in other projects).
This is my code, which requires an mp4 file (here I'm using Aphex Twin's song with the notorious Demon Face hidden in the DTFT). I'm specifically interested in getting this working with the av library and I am sufficiently convinced that it is reading the file correctly and producing a numpy array of floating point numbers.
import numpy as np
import av
import skimage.io
import scipy.signal as signal
container = av.open("tmp/aphex.mp4")
frames = container.streams.audio[0].frames
chunk = container.streams.audio[0].frame_size // 2 # bug in av?
rate = container.streams.audio[0].rate
fltp = np.zeros((frames, chunk), dtype=float)
for n, frame in enumerate(container.decode(audio=0)):
fltp[n, :] = np.frombuffer(frame.planes[0], dtype=float)
fltp = fltp.flatten()
# check that it worked by playing it (just one channel)
fltp.tofile("tmp/aphex.raw")
# play -t raw -r 44100 -e floating-point -b 32 -c 1 tmp/aphex.raw
custom_chunk = 4096
freqs, times, data = signal.spectrogram(fltp,
fs=rate,
nperseg=custom_chunk,
detrend="linear")
data = data - np.min(data)
data = data / np.max(data)
data = 1 - data
skimage.io.imsave("tmp/aphex.png", data)
But this produces a very sparse image (this is an image, honest, not a big vertical space) and if I add the following lines
data = np.log10(data)
data[data == -np.inf] = 0
to introduce a log scale (as many do) then it looks even weirder (cropped because of 2MB upload limit)
I've tried lots of other things, like trying to normalise per column (which looks slightly better, but still weird) but my spectrograms still look nothing like they are supposed to.
Does anybody know what I'm doing wrong?
(I also tried using numpy.fft.rfft / scipy directly on each chunk... my images looked pretty much the same as these ones. I've also tried a few different movies / songs)