I am building a tool which is supposed to run on a server and analyze sound files. I want to do this in Ruby as all my other tools are written in Ruby as well. But I am having trouble finding a good way of accomplishing this.
A lot of the examples I've found has been doing visualizers and graphical stuff. I just need the FFT data, nothing more. I need to both get the audio data, and do a FFT on it. My end goal is to calculate some stuff like the mean/median/mode, 25th-percentile, and 75th-percentile over all frequencies (weighted amplitude), the BPM, and perhaps some other good characteristic to later be able to cluster similar sounds together.
First I tried to use ruby-audio and fftw3 but I never go the two to really work together. The documentation wasn't good either so I really didn't know what data was being shuffled around. Next I tried to use bplay / brec and limit my Ruby script to just use STDIN and perform an FFT on that (still using fftw3). But I couldn't get bplay/brec to work since the server doesn't have a sound card and I didn't manage to just get the audio directly to STDOUT without going to an audio device first.
Here's the closest I've gotten:
# extracting audio from wav with ruby-audio
buf = RubyAudio::Buffer.float(1024)
RubyAudio::Sound.open(fname) do |snd|
while snd.read(buf) != 0
# ???
end
end
# performing FFT on audio
def get_fft(input, window_size)
data = input.read(window_size).unpack("s*")
na = NArray.to_na(data)
fft = FFTW3.fft(na).to_a[0, window_size/2]
return fft
end
So now I'm stuck and can't find any more good results on Google. So perhaps you SO guys can help me out?
Thanks!