1

I am currently working on implementing a streaming audio feature, and I've encountered an issue related to merging audio buffers using the AudioContext. My goal is to fetch 5-second audio chunks and play them to create a continuous audio stream.

Here's what I've done so far:

  1. I fetch the first 5-second audio chunk, decode it, and store it in an varriable as type of AudioBuffer.
  2. When the user clicks the "Play" button, I fetch other chunks and merge it to first AudioBuffer where I re-stored in same variable.

The problem arises when transitioning from one chunk to another during playback. There is a noticeable pause gap between the chunks.

I suspect that this pause gap is due to the process of merging subsequent audio chunks with the initial AudioBuffer. As the playback progresses from, for instance, 00:04 to 00:05, the pause becomes evident.

How can I effectively merge audio buffers in a way that eliminates or minimizes these pause gaps between chunks? I want to achieve a smooth playing of this audio

also here is demo example of this issue, click play and you will notice that gaps

import audios, { preBuffer } from "./data";
import { fetchDecode, mergeAudioBuffers } from "./utils";

const playButton = document.getElementById("play") as HTMLButtonElement;

let ctx: AudioContext;
let combinedAudioBuffers: AudioBuffer;
let source: AudioBufferSourceNode;
let startTime = 0;
let playbackTime = 0;

// decode first buffer before starting streaming
window.onload = async () => {
  ctx = new AudioContext();
  const arrayBuffer: ArrayBuffer = await fetchDecode(preBuffer);
  const audioBuffer: AudioBuffer = await ctx.decodeAudioData(arrayBuffer);
  combinedAudioBuffers = audioBuffer;
  const src: AudioBufferSourceNode = ctx.createBufferSource();
  src.buffer = audioBuffer;
  src.connect(ctx.destination);
  source = src;
};

playButton.addEventListener("click", async () => {
  startTime = Date.now();
  source.start(0);
  playButton.innerHTML = "Playing";
  playButton.disabled = true;

  // decode all the url chunks add to AudioBuffer and continue playing
  for (let audio of audios) {
    const arraybuffer = await fetchDecode(audio);
    const decodeBuffer = await ctx.decodeAudioData(arraybuffer);
    const mergeTwoBuffers = mergeAudioBuffers(
      ctx,
      combinedAudioBuffers,
      decodeBuffer
    );
    combinedAudioBuffers = mergeTwoBuffers;
    playbackTime = Date.now();
    let playback = (playbackTime - startTime) / 1000;

    source.stop();

    source = ctx.createBufferSource();
    source.buffer = combinedAudioBuffers;
    source.connect(ctx.destination);

    source.start(0, playback);
  }
});

callmenikk
  • 1,358
  • 2
  • 9
  • 24

1 Answers1

2

(I'm assuming your merge code is good... you didn't show it to us, so we don't know either way...)

Generally, you can't do this sort of split-and-merge with lossy codecs, at least without some cooperation on the encoder end.

You're using MP3, which has the concept of a 'frame' which encodes 576 audio samples. So, you at least need to split on a frame boundary, not an arbitrary amount of time.

It's worse than that though because a frame can depend on a chunk of data in another frame. This is the bit reservoir, and it's a sort of hack to use some more bits for more complex passages and less bits for the easy stuff. Sort of a VBR within a CBR stream. In any case, it means that you can't correctly decode an arbitrary frame by itself. You potentially need surrounding frames to do that.

Additionally, a normal MP3 stream doesn't have any way to signal the decoder to delay, so gapless playback of MP3 is not possible without some modifications. The encoders normally insert a couple frames of silence to allow for initializing the decoder.

So, all that being said:

  • Are you actually sure you need to do this chunked? The browsers are good at streaming on their own. Even if you need to do some tweaking of the stream, you can use MediaSource Extensions.

  • If you must use chunked for some reason, consider re-using HLS. It's a well-implemented standard that generally uses AAC in MP4/ISOBMFF files for audio. Then you don't have to re-implement any of this, neither on the encoding or decoding side.

Brad
  • 159,648
  • 54
  • 349
  • 530
  • I'm trying to achieve same feature as soundcloud and spotify has as I looked their responses in network they use same technique to make chunk by chunk streaming for fast playback – callmenikk Aug 13 '23 at 22:05
  • 1
    @callmenikk Streaming in a chunked manner does not make playback start faster. It actually adds a great deal of overhead to do it that way. Think of it... you could either make a singular request for a stream and the server can send it to you, filling your buffer quickly, allowing you to control the flow via simple backpressure. Or, you could make a hundred independent HTTP requests, repeatedly asking the server, over and over again for what you already asked it for. – Brad Aug 13 '23 at 23:27
  • then why is spotify using chunk by chunk streaming? they do not really get one whole audio as `ArrayBuffer` – callmenikk Aug 13 '23 at 23:36
  • 1
    @callmenikk I don't use Spotify so I don't know what they use or why, but there's certainly no reason to load a whole file into an ArrayBuffer. Obviously that'd be bad for streaming. The easiest thing to do is simply `new Audio('https://example.com/something.webm')`, and let the browser figure it out. I'd assume that Spotify is using MediaSource though, like most sites like them. That gives them more control over buffering, analytics, DRM, etc. Also, they have a ton of listeners so they may actually transcode to multiple bitrates, which is switched client-side. – Brad Aug 13 '23 at 23:50
  • AFAIK a single buffer cannot do the job, for the reasons already mentioned by @Brad. An efficient approach should use more than one buffer, with a multithreaded routine. – pierpy Aug 14 '23 at 08:19