25

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I'm looking at with the returned arrays I'm getting from each method.

For getByteTimeDomainData, what time period is covered with each pass? I believe most oscopes cover a 32 millisecond span for each pass. Is that what is covered here as well? For the actual element values themselves, the range seems to be 0 - 255. Is this equivalent to -1 - +1 volts?

For getByteFrequencyData the frequencies covered is based on the sampling rate, so each index is an actual frequency, but what about the actual element values themselves? Is there a dB range that is equivalent to the values returned in the returned array?

Brad.Smith
  • 1,071
  • 3
  • 14
  • 28
  • related question: https://stackoverflow.com/questions/60983069/web-audio-analysers-getfloattimedomaindata-buffer-offset-wrt-buffers-at-other-t – mathheadinclouds Apr 04 '20 at 00:59

3 Answers3

31

getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.

The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)

If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).

cwilso
  • 13,610
  • 1
  • 30
  • 35
  • 1
    how did you get 2.3ms from a frequencyBinCount of 1024 and a sampling rate of 44.1kHz? – Brad.Smith Jun 06 '14 at 17:08
  • 1
    Ooops, off by a factor of ten! I should have said 23 milliseconds. 1024 samples divided by 44100 samples per second (aka Hertz) equals 0.023219... seconds. – cwilso Jun 06 '14 at 22:10
  • 4
    Why is `frequencyBinCount` used as the width of the time domain data? Is there some relationship between the time domain window and the frequency bin count for the fft that I'm missing here? –  Mar 21 '17 at 10:53
  • 1
    That's how an FFT works - you have symmetry between the required length of the time-domain audio data and the frequencies. – cwilso Mar 29 '17 at 16:20
  • 3
    @cwilso: you have it backwards, see my answer. https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getFloatTimeDomainData https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getFloatFrequencyData – mathheadinclouds Mar 31 '20 at 14:55
  • 1
    @cwilso: I'm just now looking through your code for pitch detection via auto-correlation, which you have on your github page. You have a bug in there which matches your error here in your answer. You're setting global variable buflen to half of analyser.fftSize, but it should be equal to the fftSize. You can see it if you use your DEBUGCANVAS with id 'waveform', yank the width up to 2048 (=fftSize), and you'll see that there 'is something' the whole 2048 samples, i.e. you see oscillation on the canvas. If you go greater 2048, then you have a gap right of 2048. so, it's 100% certain what I say. – mathheadinclouds Mar 31 '20 at 16:23
  • 1
    @user993683: fftSize is to be used as the width of the time domain data, not frequencyBinCount. The only relation of frequencyBinCount to the width of the time domain data is, that it is half of it. See me answer for details. – mathheadinclouds Jun 03 '20 at 20:13
  • @cwilso , isn't @mathheadinclouds correct, that the `TimeDomain` array data is the same length as `fftSize`? According to [Mozilla's documentation](https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getFloatTimeDomainData); "Float32Array needs to be the same length as the fftSize": , `var dataArray = new Float32Array(analyser.fftSize); // Float32Array needs to be the same length as the fftSize analyser.getFloatTimeDomainData(dataArray);` – Nate Anderson Aug 18 '20 at 04:25
16

Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox browser; it does not work in my Chrome browser).

TimeDomain/getFloatTimeDomainData

  • TimeDomain functions are over some span of time.
  • We often visualize TimeDomain data using oscilloscopes.
  • In other words:
    • we visualize TimeDomain data with a line chart,
    • where the x-axis (aka the "original domain") is time
    • and the y axis is a measure of a signal (aka the "amplitude").
  • Change the voice-change-o-matic "visualizer setting" to Sinewave to see getFloatTimeDomainData(...)

visualizer-setting to Sinewave illustrates TimeDomain data like an oscilloscope

Frequency/getFloatFrequencyData

  • Frequency functions (GetByteFrequencyData) are at a point in time; i.e. right now; "the current frequency data"
  • We sometimes see these in mp3 players/ "winamp bargraph style" music players (aka "equalizer" visualizations).
  • In other words:
    • we visualize Frequency data with a bar graph
    • where the x-axis (aka "domain") are frequencies or frequency bands
    • and the y-axis is the strength of each frequency band
  • Change the voice-change-o-matic "visualizer setting" to Frequency bars to see getFloatFrequencyData(...)

visualizer-setting to sinewave illustrates Frequency data like an mp3 player

Fourier Transform (aka Fast Fourier Transform/FFT)

  • Another way to think about "time domain vs frequency" is shown the diagram below, from Fast Fourier Transform wikipedia
    • getFloatTimeDomainData gives you the chart on on the top (x-axis is Time)
    • getFloatFrequencyData gives you the chart on the bottom (x-axis is Frequency)
    • a Fast Fourier Transform (FFT) converts the Time Domain data into Frequency data, in other words, FFT converts the first chart to the second chart.

Fast Fourier Transform (FFT) converts Time Domain data to Frequency data original source https://en.wikipedia.org/wiki/Fast_Fourier_transform#/media/File:FFT_of_Cosine_Summation_Function.svg

Nate Anderson
  • 18,334
  • 18
  • 100
  • 135
  • 1
    Taking what it says: "In other words, we visualize Frequency data with a bar graph, where the x-axis are frequency bands, and the y-axis is the strength of each frequency band" It could be said that each element of the array represents the volume in dB per bin? – sebas.varela Apr 12 '21 at 04:41
  • 1
    Yes @SebastiánVarellaGmz , you are referring to `getFloatFrequencyData`, and [as the documentation says](https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getFloatFrequencyData): "Each item in the array represents the **decibel value** for a **specific frequency**. The frequencies are spread linearly from 0 to 1/2 of the sample rate. For example, for a 48000 Hz sample rate, the last item of the array will represent the decibel value for 24000 Hz." – Nate Anderson Apr 14 '21 at 23:16
10

cwilso has it backwards.

the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).

fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.

A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.

So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)

I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

mathheadinclouds
  • 3,507
  • 2
  • 27
  • 37
  • I came across this thread in the WebAudio API issues that explains why we only get magnitude vals. https://github.com/WebAudio/web-audio-api-v2/issues/107#issuecomment-742704691 – meta-meta Nov 08 '22 at 03:01