28

The Web Audio API has an analyser node which allows you to get FFT data on the audio you're working with and has byte and float ways of getting the data. The byte version makes a bit of sense, returning what looks like a normalized (depending on min and max decibel values) intensity spectrum with 0 being no component of the audio at a specific frequency bin and 255 being the max.

But I'd like a bit more detail than 8 bit, using the float version however, gives weird results.

freqData = new Float32Array(analyser.frequencyBinCount);
analyser.getFloatFrequencyData(freqData);

This gives me values between -891.048828125 and 0. -891 shows up corresponding to silence, so it's somehow the minimum value while I'm guessing 0 is equivalent to the max value.

What's going on? Why is -891.048828125 significant at all? Why a large negative being silence and zero being maximum? Am I using the wrong FloatArray or is there misconfiguration? Float64 gives 0 values.

happy coder
  • 1,517
  • 1
  • 14
  • 29
Newmu
  • 1,930
  • 1
  • 20
  • 25

3 Answers3

36

Since there seems to be zero documentation on what the data actually represents, I looked into the relevant source code of webkit: RealtimeAnalyser.cpp

Short answer: subtract analyser.minDecibels from every value of the Float32Array to get positive numbers and multiply with (analyser.maxDecibels - analyser.minDecibels) to get a similar representation as with getByteFrequencyData, just with more resolution.

Long answer:

Both getByteFrequencyData and getFloatFrequencyData give you the magnitude in decibels. It's just scaled differently and for getByteFrequencyData a minDecibels constant is subtracted:

Relevant code in webkit for getByteFrequencyData:

const double rangeScaleFactor = m_maxDecibels == m_minDecibels ? 1 : 1 / (m_maxDecibels - m_minDecibels);
float linearValue = source[i];
double dbMag = !linearValue ? minDecibels : AudioUtilities::linearToDecibels(linearValue);

// The range m_minDecibels to m_maxDecibels will be scaled to byte values from 0 to UCHAR_MAX.
double scaledValue = UCHAR_MAX * (dbMag - minDecibels) * rangeScaleFactor;

Relevant code in webkit for getFloatFrequencyData:

float linearValue = source[i];
double dbMag = !linearValue ? minDecibels : AudioUtilities::linearToDecibels(linearValue);
destination[i] = float(dbMag);

So, to get positive values, you can simply subtract minDecibels yourself, which is exposed in the analyzer node:

 //The minimum power value in the scaling range for the FFT analysis data for conversion to unsigned byte values.
 attribute double minDecibels;

Another detail is that by default, the analyser node does time smoothing, which can be disabled by setting smoothingTimeConstant to zero.

The default values in webkit are:

const double RealtimeAnalyser::DefaultSmoothingTimeConstant  = 0.8;
const double RealtimeAnalyser::DefaultMinDecibels = -100;
const double RealtimeAnalyser::DefaultMaxDecibels = -30;

Sadly, even though the analyser node computes a complex fft, it doesn't give access to the complex representations, just the magnitudes of it.

shapecatcher
  • 907
  • 6
  • 9
  • 2
    By subtracting minDecibels from the float value, I sometime don't get positive numbers. Setting the minDecibels value doesn't actually limit anything in my case, I keep getting smaller values. Any thoughts? – nevos Oct 07 '14 at 16:00
2

You are correct in using a Float32Array. I found an interesting tutorial on using the Audio Data API, which while it is different than the Web Audio API, gave me some useful insight to me about what you are trying to do here. I had a quick peek to see about why the numbers are negative, and didn't notice anything obvious, but I wondered if these numbers might be in decibels, dB, which commonly is given in negative numbers, and zero is the peak. The only problem with that theory is that -891 seems to be a really small number for dB.

happy coder
  • 1,517
  • 1
  • 14
  • 29
  • 1
    That tutorial is for the depreciated Moz implementation before the standard came out, tried looking at it too! I think you're right actually. Floats can get really small and -891 is pretty close to 2^-128 in db which seems about right for what floats can store. – Newmu Jan 05 '13 at 07:48
  • If I do ln(2^-128), I get -88.7 on an HP15c (emulated on my mac) calculator. I can't imagine any audio equipment being able to have as little noise as (-891 db). Hmmm – happy coder Jan 05 '13 at 10:42
  • It's digital so the noise floor can be a hard 0 and not have analog noise keeping it higher, I'm guessing. Db is 10*ln(x) not ln(x). It's much more exactly 2^-128.55. Converting back to decimal values assuming it was db gave me sensible data like I've seen from fft's before, if just scaled differently. – Newmu Jan 05 '13 at 15:53
  • Yes, good point, it just looked odd from my EE background of which the training was many years ago (I graduated in 1991, and I've been doing other things for a long time now). – happy coder Jan 05 '13 at 18:10
  • @happycoder could you please remove the link to the Audio Data API, or at least rephrase that sentence to avoid future confusion? The Audio Data API and the Web Audio API are two different beasts. Thanks! – Oskar Eriksson Jan 06 '13 at 20:50
  • I won't remove the link, I think it is helpful. What would you like to see? – happy coder Jan 07 '13 at 03:52
  • 2
    Just don't call it "this API", since the link is about the Audio Data API, and the question is about the Web Audio API. There has been confusion between the two previously here at stackoverflow, and making clear they're not the same in accepted answers seems like a good thing to do. :) – Oskar Eriksson Jan 07 '13 at 08:27
  • Ok. Thanks for the fb. I'm new here, and I want to be as accurate / helpful as possible. CHEERS! I'll fix it up after this comment posting! – happy coder Jan 07 '13 at 10:18
  • Oskar, in my previous comment, I didn't mean "what would you like to see" so much as what would provide more clarity? Feel free to upvote now, or suggest other changes! (wink wink) – happy coder Jan 07 '13 at 10:24
2

Correct on both points in the previous answer and comments - the numbers are in decibels, so 0 is max and -infinity is min (absolute silence). -891.0... is, I believe, just a floating point conversion oddity.

cwilso
  • 13,610
  • 1
  • 30
  • 35