I am building a visual equalizer for audio and am confused what the output to my FFT is. My end goal is to send a simplified array of 6 numbers (1 bass, 4 mid-tones, and 1 treble) to an Arduino equipped with bluetooth. The numbers will denote how many LEDs to light up in each column (1 column for bass, 1 column for treble, etc).
The first step is to change an audio signal into numerical representations. In order to do that I want to combine certain frequencies into discrete buckets at regular time intervals, i.e. one for Bass that's 60 to 250 Hz.
I've obtained a 300Hz wav file that I am trying to deconvolute using the ruby FFTW3
gem. I would expect one sine wave that completes 300 periods over the course of a 1 second sample. When I pass in a 1s sample of a 300 Hz tone, the fft.length = 1024
and fft[0] = 22528
.
I have been using these conversations Audio Equalizer in Ruby, and Extract Fast Fourier Transform data from file as my main points of reference as the documentation for the ruby gem is confusing.
Here's my code:
require "ruby-audio"
require "fftw3"
require "narray"
# Audio sample rate and block size:
SAMPLE_RATE = 44100
# break the audio into chunks (called windows, or frames)
# pass them sequentially to the FFT.
# gives a frequency profile that changes over time
# e.g. 1024, 2048, 4096, 8192, etc..
WINDOW = 2048
# samplerate/window => 44100/2048 = 10.7
# Updates about 11 times/second
# RESOLUTION = (1.0*SAMPLE_RATE/WINDOW)
filename = ARGV[0]
wave = Array.new # length is window size/2
fft = Array.new(WINDOW/2,[])
begin
# extracting audio from wav with ruby-audio
buf = RubyAudio::Buffer.float(WINDOW)
RubyAudio::Sound.open(filename) do |snd|
while snd.read(buf) != 0
wave.concat(buf.to_a)
na = NArray.to_na(buf.to_a)
fft_slice = FFTW3.fft(na).to_a[0, WINDOW/2]\
# na = array to be transformed
j=0
fft_slice.each do |x|
# getting the real part of the complex number
x = x.abs
fft[j] << x; j+=1
end
end
end
rescue => err
# log.error "error reading audio file: " + err
puts 'There was an error, exiting!'
exit
end
Are the inner-most arrays denoting frequencies, and the outer representing time passing, or is it the other way around. How do I know which array index represents a specific frequency?
I am unsure of how to test this to see if it has accurately created the proper frequency. Is there a good way to look at the data that I've missed, visually or otherwise?
Thanks for any pointers!