-1

I am looking for scaling a PNG file according to an audio provided, a frequency range (20hz-1000hz for example) and a threshold, for a smooth effect. For example, when there is a kick, scale go to 120% smoothly, I would like to make those audio visualizers such as dubstep, etc... where when kicks comes in, their image are "pumping". First, is it doable with ffmpeg? Where to start? I found showcqt that takes frequencies in input etc., but its output is a video so I don't think I can use it in my case. Any help appreciated.

  • Does ffmpeg allow you access to the individual PCM values of the audio file, as they are processed? If yes and if you are willing to roll your own, I will answer with a couple suggestions for algorithms. – Phil Freihofner Jul 10 '22 at 18:42
  • From what I've seen, I can have access to PCM values. I am using ffmpeg as a command inside a Python script, so I can also have access to individual PCM values (easier than ffmpeg I think). Can't wait to hear some suggestions, thank you for helping me. – UnlockeerFromFrance Jul 10 '22 at 23:38

1 Answers1

0

If you are able to read the PCM values as they are being output, then you might consider using a rolling RMS average in order to get a continuous stream of amplitudes. IDK the best length of the array. Perhaps it should correspond to the number of audio frames that would give you an update for each visual frame? The folks at the DSP site would have the best insights.

If you do a rolling average, computations are not terribly expensive. You'd do the square on the incoming and add that to a ring buffer (circular queue) and drop the outgoing. Only those data points need be added to the rolling average when computing the new rolling average, since the denominator is fixed and known. I found a video that describes the basic RMS math here using Matlab.

It might be necessary to add some smoothing to visualizer that is receiving the volume updates. Also, handing off data from the audio thread should likely employ some form of loose coupling. It would not be good if the thread that is processing the audio was also handling graphics.

I'm a little over my head, but I think this is what is generally done for visualizers.

Phil Freihofner
  • 7,645
  • 1
  • 20
  • 41