2

(question rewritten integrating bits of information from answers, plus making it more concise.)

I use analyser=audioContext.createAnalyser() in order to process audio data, and I'm trying to understand the details better.

I choose an fftSize, say 2048, then I create an array buffer of 2048 floats with Float32Array, and then, in an animation loop (called 60 times per second on most machines, via window.requestAnimationFrame), I do

analyser.getFloatTimeDomainData(buffer);

which will fill my buffer with 2048 floating point sample data points.

When the handler is called the next time, 1/60 second has passed. To calculate how much that is in units of samples, we have to divide it by the duration of 1 sample, and get (1/60)/(1/44100) = 735. So the next handler call takes place (on average) 735 samples later.

So there is overlap between subsequent buffers, like this:

buffer overlap

We know from the spec (search for 'render quantum') that everything happens in "chunck sizes" which are multiples of 128. So (in terms of audio processing), one would expect that the next handler call will usually be either 5*128 = 640 samples later, or else 6*128 = 768 samples later - those being the multiples of 128 closest to 735 samples = (1/60) second.

Calling this amount "Δ-samples", how do I find out what it is (during each handler call), 640 or 768 or something else?

Reliably, like this:

Consider the 'old buffer' (from previous handler call). If you delete "Δ-samples" many samples at the beginning, copy the remainder, and then append "Δ-samples" many new samples, that should be the current buffer. And indeed, I tried that, and that is the case. It turns out "Δ-samples" often is 384, 512, 896. It is trivial but time consuming to determine "Δ-samples" in a loop.

I would like to compute "Δ-samples" without performing that loop.

One would think the following would work:

(audioContext.currentTime() - (result of audioContext.currentTime() during last time handler ran))/(duration of 1 sample)

I tried that (see code below where I also "stich together" the various buffers, trying to reconstruct the original buffer), and - surprise - it works about 99.9% of the time in Chrome, and about 95% of the time in Firefox.

I also tried audioContent.getOutputTimestamp().contextTime, which does not work in Chrome, and works 9?% in Firefox.

Is there any way to find "Δ-samples" (without looking at the buffers), which works reliably?

Second question, the "reconstructed" buffer (all the buffers from callbacks stitched together), and the original sound buffer are not exactly the same, there is some (small, but noticable, more than usual "rounding error") difference, and that is bigger in Firefox.

Where does that come from? - You know, as I understand the spec, those should be the same.

var soundFile = 'https://mathheadinclouds.github.io/audio/sounds/la.mp3';
var audioContext = null;
var isPlaying = false;
var sourceNode = null;
var analyser = null;
var theBuffer = null;
var reconstructedBuffer = null;
var soundRequest = null;
var loopCounter = -1;
var FFT_SIZE = 2048;
var rafID = null;
var buffers = [];
var timesSamples = [];
var timeSampleDiffs = [];
var leadingWaste = 0;

window.addEventListener('load', function() {
  soundRequest = new XMLHttpRequest();
  soundRequest.open("GET", soundFile, true);
  soundRequest.responseType = "arraybuffer";
  //soundRequest.onload = function(evt) {}
  soundRequest.send();
  var btn = document.createElement('button');
  btn.textContent = 'go';
  btn.addEventListener('click', function(evt) {
    goButtonClick(this, evt)
  });
  document.body.appendChild(btn);
});

function goButtonClick(elt, evt) {
  initAudioContext(togglePlayback);
  elt.parentElement.removeChild(elt);
}

function initAudioContext(callback) {
  audioContext = new AudioContext();
  audioContext.decodeAudioData(soundRequest.response, function(buffer) {
    theBuffer = buffer;
    callback();
  });
}

function createAnalyser() {
  analyser = audioContext.createAnalyser();
  analyser.fftSize = FFT_SIZE;
}

function startWithSourceNode() {
  sourceNode.connect(analyser);
  analyser.connect(audioContext.destination);
  sourceNode.start(0);
  isPlaying = true;
  sourceNode.addEventListener('ended', function(evt) {
    sourceNode = null;
    analyser = null;
    isPlaying = false;
    loopCounter = -1;
    window.cancelAnimationFrame(rafID);
    console.log('buffer length', theBuffer.length);
    console.log('reconstructedBuffer length', reconstructedBuffer.length);
    console.log('audio callback called counter', buffers.length);
    console.log('root mean square error', Math.sqrt(checkResult() / theBuffer.length));
    console.log('lengths of time between requestAnimationFrame callbacks, measured in audio samples:');
    console.log(timeSampleDiffs);
    console.log(
      timeSampleDiffs.filter(function(val) {
        return val === 384
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 512
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 640
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 768
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val === 896
      }).length,
      '*',
      timeSampleDiffs.filter(function(val) {
        return val > 896
      }).length,
      timeSampleDiffs.filter(function(val) {
        return val < 384
      }).length
    );
    console.log(
      timeSampleDiffs.filter(function(val) {
        return val === 384
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 512
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 640
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 768
      }).length +
      timeSampleDiffs.filter(function(val) {
        return val === 896
      }).length
    )
  });
  myAudioCallback();
}

function togglePlayback() {
  sourceNode = audioContext.createBufferSource();
  sourceNode.buffer = theBuffer;
  createAnalyser();
  startWithSourceNode();
}

function myAudioCallback(time) {
  ++loopCounter;
  if (!buffers[loopCounter]) {
    buffers[loopCounter] = new Float32Array(FFT_SIZE);
  }
  var buf = buffers[loopCounter];
  analyser.getFloatTimeDomainData(buf);
  var now = audioContext.currentTime;
  var nowSamp = Math.round(audioContext.sampleRate * now);
  timesSamples[loopCounter] = nowSamp;
  var j, sampDiff;
  if (loopCounter === 0) {
    console.log('start sample: ', nowSamp);
    reconstructedBuffer = new Float32Array(theBuffer.length + FFT_SIZE + nowSamp);
    leadingWaste = nowSamp;
    for (j = 0; j < FFT_SIZE; j++) {
      reconstructedBuffer[nowSamp + j] = buf[j];
    }
  } else {
    sampDiff = nowSamp - timesSamples[loopCounter - 1];
    timeSampleDiffs.push(sampDiff);
    var expectedEqual = FFT_SIZE - sampDiff;
    for (j = 0; j < expectedEqual; j++) {
      if (reconstructedBuffer[nowSamp + j] !== buf[j]) {
        console.error('unexpected error', loopCounter, j);
        // debugger;
      }
    }
    for (j = expectedEqual; j < FFT_SIZE; j++) {
      reconstructedBuffer[nowSamp + j] = buf[j];
    }
    //console.log(loopCounter, nowSamp, sampDiff);
  }
  rafID = window.requestAnimationFrame(myAudioCallback);
}

function checkResult() {
  var ch0 = theBuffer.getChannelData(0);
  var ch1 = theBuffer.getChannelData(1);
  var sum = 0;
  var idxDelta = leadingWaste + FFT_SIZE;
  for (var i = 0; i < theBuffer.length; i++) {
    var samp0 = ch0[i];
    var samp1 = ch1[i];
    var samp = (samp0 + samp1) / 2;
    var check = reconstructedBuffer[i + idxDelta];
    var diff = samp - check;
    var sqDiff = diff * diff;
    sum += sqDiff;
  }
  return sum;
}

In above snippet, I do the following. I load with XMLHttpRequest a 1 second mp3 audio file from my github.io page (I sing 'la' for 1 second). After it has loaded, a button is shown, saying 'go', and after pressing that, the audio is played back by putting it into a bufferSource node and then doing .start on that. the bufferSource is the fed to our analyser, et cetera

related question

I also have the snippet code on my github.io page - makes reading the console easier.

mathheadinclouds
  • 3,507
  • 2
  • 27
  • 37
  • experiments I made have shown that if the "Δ-samples", computed as the question elaborates, is off, it's always too low, never too high, and the amount by which it's too low is always a multiple of 128. – mathheadinclouds Apr 04 '20 at 02:03

3 Answers3

2

Unfortunately there is no way to find out the exact point in time at which the data returned by an AnalyserNode was captured. But you might be on the right track with your current approach.

All the values returned by the AnalyserNode are based on the "current-time-domain-data". This is basically the internal buffer of the AnalyserNode at a certain point in time. Since the Web Audio API has a fixed render quantum of 128 samples I would expect this buffer to evolve in steps of 128 samples as well. But currentTime usually evolves in steps of 128 samples already.

Furthermore the AnalyserNode has a smoothingTimeConstant property. It is responsible for "blurring" the returned values. The default value is 0.8. For your use case you probably want to set this to 0.

EDIT: As Raymond Toy pointed out in the comments the smoothingtimeconstant only has an effect on the frequency data. Since the question is about getFloatTimeDomainData() it will have no effect on the returned values.

I hope this helps but I think it would be easier to get all the samples of your audio signal by using an AudioWorklet. It would definitely be more reliable.

chrisguttandin
  • 7,025
  • 15
  • 21
  • ah, spec mentions blackman window. That explains a lot - such as the blurring, at least potentially. Thank you! I looked a smoothingTimeConstant, and fiddled around with that. It had no effect whatsoever. Also, I conjectured right away that Firefox might have a different smoothingTimeConstant, which would explain the higher rms error in FF. But not so - it's also 0.8 in FF, just as in Chrome. Strange. spec calls 128 the 'render quantum', good point. Do you have example code for AudioWorklet? – mathheadinclouds Apr 02 '20 at 12:20
  • 2
    The Blackman window and smoothingTimeConstant only apply when you want the frequency data. The time domain data is not modified in any way. – Raymond Toy Apr 02 '20 at 15:33
  • Thanks Raymond. I edited the answer to mention that the `smoothingtimeconstant` will have no effect. – chrisguttandin Apr 03 '20 at 14:05
  • 1
    Sorry mathheadinclouds, for the misleading info on the `smoothingtimeconstant `. The Chrome team has created some useful demos which show how the AudioWorklet can be used. https://googlechromelabs.github.io/web-audio-samples/audio-worklet/ – chrisguttandin Apr 03 '20 at 14:07
  • thank you for the AudioWorklet examples link. It appears, Firefox doesn't have AudioWorklet yet. Also, somehow, if I copy the code to my local webserver, it stops working - I get '(index):19 Uncaught (in promise) DOMException: The user aborted a request.' Anyway, not supporting Firefox is not an option. So I guess I'll go with Raymond's advice and use ScriptProcessorNode. – mathheadinclouds Apr 04 '20 at 03:28
  • 1
    Firefox support is on the way. It's enabled in Nightly already. You could also use a polyfill like [standardized-audio-context](https://github.com/chrisguttandin/standardized-audio-context), [GoogleChromeLabs/audioworklet-polyfill](https://github.com/GoogleChromeLabs/audioworklet-polyfill) or [jariseon/audioworklet-polyfill](https://github.com/jariseon/audioworklet-polyfill). They all use the `AudioWorklet` if it is available and otherwise fall back to the `ScriptProcessorNode`. – chrisguttandin Apr 04 '20 at 09:37
2

I think the AnalyserNode is not what you want in this situation. You want to grab the data and keep it synchronized with raf. Use a ScriptProcessorNode or AudioWorkletNode to grab the data. Then you'll get all the data as it comes. No problems with overlap, or missing data or anything.

Note also that the clocks for raf and audio may be different and hence things may drift over time. You'll have to compensate for that yourself if you need to.

Raymond Toy
  • 5,490
  • 10
  • 13
  • I'm confused. ScriptProcessorNode has .onaudioprocess to which you feed a callback function, which is already being called periodically. So my first guess would be that you do everything in that callback function. So where does raf (requestAnimationFrame) come in? What would I need that for? It might make total sense what you're saying, but without example code, I'm not quite understanding what you mean. As for the different clocks, yes, indeed. I'm trying to use the "audio time" only. Are you suggesting the same or something different? – mathheadinclouds Apr 02 '20 at 16:32
  • I have the impulse to accept your answer - because I think you're right. I really should be using ScriptProcessorNode or AudioWorkletNode. As for ScriptProcessorNode, I checked and confirmed: no overlap. Then again, I'd like to know if it's possible to reliably find those buffer overlap amounts of AnalyserNode without looking into the buffers. Still hoping that I'm wrong, and that it's possible. – mathheadinclouds Apr 02 '20 at 17:33
  • For the first question, you can probably bufffer the data received from ScriptProcessorNode, and when raf is called, grab the appropriate set of data from the buffer. Or you could just update the graph whenever you have a new buffer. I don't do graphics/raf, so I'm not really knowledgeable here. – Raymond Toy Apr 02 '20 at 21:50
  • For your second question, I think you pretty much have to examine the data from an AnalyserNode. The timing from raf isn't perfect and neither is the timing of what data you get from an AnalyserNode because the audio thread is running independently of the main thread it can update data at unexpected times. (But, of course, not while you're reading the data out!) – Raymond Toy Apr 02 '20 at 21:52
0

I'm not really following your math, so I can't tell exactly what you had wrong, but you seem to look at this in a too complicated manner.

The fftSize doesn't really matter here, what you want to calculate is how many samples have been passed since the last frame.

To calculate this, you just need to

  • Measure the time elapsed from last frame.
  • Divide this time by the time of a single frame.

The time of a single frame, is simply 1 / context.sampleRate.
So really all you need is currentTime - previousTime * ( 1 / sampleRate) and you'll find the index in the last frame where the data starts being repeated in the new one.

And only then, if you want the index in the new frame you'd subtract this index from the fftSize.

Now for why you sometimes have gaps, it's because AudioContext.prototype.currentTime returns the timestamp of the beginning of the next block to be passed to the graph.
The one we want here is AudioContext.prototype.getOuputTimestamp().contextTime which represents the timestamp of now, on the same same base as currentTime (i.e the creation of the context).

(function loop(){requestAnimationFrame(loop);})();
(async()=>{
  const ctx = new AudioContext();
  
  const buf = await fetch("https://upload.wikimedia.org/wikipedia/en/d/d3/Beach_Boys_-_Good_Vibrations.ogg").then(r=>r.arrayBuffer());
  const aud_buf = await ctx.decodeAudioData(buf);
  const source = ctx.createBufferSource();
  source.buffer = aud_buf;
  source.loop = true;
  
  const analyser = ctx.createAnalyser();
  const fftSize = analyser.fftSize = 2048;
  source.loop = true;
  source.connect( analyser );
  source.start(0);
  
  // for debugging we use two different buffers
  const arr1 = new Float32Array( fftSize );
  const arr2 = new Float32Array( fftSize );

  const single_sample_dur = (1 / ctx.sampleRate);
  console.log( 'single sample duration (ms)', single_sample_dur * 1000);

  onclick = e => {
    if( ctx.state === "suspended" ) {
      ctx.resume();
      return console.log( 'starting context, please try again' );
    }
    
    console.log( '-------------' );
    
    requestAnimationFrame( () => {
      // first frame
      const time1 = ctx.getOutputTimestamp().contextTime;
      analyser.getFloatTimeDomainData( arr1 );
      
      requestAnimationFrame( () => {
        // second frame
        const time2 = ctx.getOutputTimestamp().contextTime;
        analyser.getFloatTimeDomainData( arr2 );
                
        const elapsed_time = time2 - time1;
        console.log( 'elapsed time between two frame (ms)', elapsed_time * 1000 );
        
        const calculated_index = fftSize - Math.round( elapsed_time / single_sample_dur );
        console.log( 'calculated index of new data', calculated_index );

        // for debugging we can just search for the first index where the data repeats
        const real_time = fftSize - arr1.indexOf( arr2[ 0 ] );
        console.log( 'real index', real_time > fftSize ? 0 : real_time );
        
        if( calculated_index !== real_time > fftSize ? 0 : real_time ) {
          console.error( 'different' );
        }
       
      });
    });
  };
  document.body.classList.add('ready');

})().catch( console.error );
body:not(.ready) pre { display: none; }
<pre>click to record two new frames</pre>
Kaiido
  • 123,334
  • 13
  • 219
  • 285
  • @mathheadinclouds: please then consider deleting your "too loud" comments and removing the downvote . – Hovercraft Full Of Eels Apr 02 '20 at 12:36
  • @HovercraftFullOfEels I agree the comments should be gone, the downvote, they do deal with it as they wish. Nobody should tell them what to do with it. – Kaiido Apr 02 '20 at 12:42
  • @mathheadinclouds comments are ephemeral here. They're here to tell the author about some problem with the content they wrote. You did that, that was fair use of the comment. You did that in a quite aggressive manner, that was not fine, but no offense from me. Now keep it in mind next time. Sometimes I force myself a good night of sleep before responding here. However, the message of these comments has been heard, I edited my question with I think what you need, or at least, in a better way than it was. Your comments don't apply anymore, you can delete them. – Kaiido Apr 02 '20 at 12:46
  • @mathheadinclouds if you now have other concerns on the edit, then feel free to write new comments, but I'll handle them only in 12hrs from now. – Kaiido Apr 02 '20 at 12:55
  • I played around with `.getOutputTimestamp()`, and that doesn't do the trick. The proper offset between one buffer and the next is always a multiple of 128 (called 'render quantum'). With diff(.currentTime)/(1/sampleRate), the result is (most of the time but) not always correct, but it is always a multiple of 128. With `.getOutputTimeStamp()` - both with .contextTime and with .performanceTime/1000 -, the expression diff(...)/... will not be a multiple of 128, usually. So no, that looked like a good guess, but that didn't help either. – mathheadinclouds Apr 02 '20 at 14:02
  • No time right now to triple check (will do tomorrow), and I wrote that in a hurry earlier, but fast rereading the specs I linked to, getOutputTimestamp.contextTime is the time of the last sample passed to the graph, which was also increased by that "render quantum". It should also be a multiple of 128. – Kaiido Apr 02 '20 at 15:14
  • the differences between subsequent calls to getOutputTimestamp.contextTime * sampleRate are NOT multiples of 128, I tried that. If you want to double check that tomorrow, thank you in advance. – mathheadinclouds Apr 02 '20 at 15:32
  • @mathheadinclouds That would be a Chrome bug then. In my FF they clearly are multiples of 128. But there is still a discrepancy... (ten wrong calculations over the 21.13s audio I'm using). – Kaiido Apr 03 '20 at 01:51
  • 1
    indeed, I tried it also, and audioContent.getOutputTimestamp().contextTime works on Firefox - most of the time. Strange that whatever you do, it works only most of the time. Maybe it will work next year. – mathheadinclouds Apr 04 '20 at 00:20
  • I invite you to take this up with the [meta] community, since this is a community-wide decision. – Hovercraft Full Of Eels Apr 04 '20 at 00:38