6

I am trying to capture microphone audio from a client web browser, live stream the captured audio to a Node.js server using WebSocket and then again stream the audio back to a different web browser client.

So far, at the client side, I opened a WebSocket connection in JavaScript

const webSocket = new WebSocket('ws://127.0.0.1:8080');
webSocket.binaryType = 'blob';

On connection to the server, I capture audio stream from the user's microphone and on each chunk of data available, which is available every 1 second, send it through WebSocket to the server

webSocket.onopen = event => {
console.log('info: connected to server');

navigator.mediaDevices
  .getUserMedia({ audio: true, video: false })
  .then(stream => {
    const mediaRecorder = new MediaRecorder(stream, {
      mimeType: 'audio/webm',
    });

    mediaRecorder.addEventListener('dataavailable', event => {
      if (event.data.size > 0) {
        webSocket.send(event.data);
      }
    });

    mediaRecorder.start(1000);
  });
};

Now, on the server-side, using the ws module, I receive each blob and send it over to another client

wss.on('connection', ws => {
  console.log('info: client connected');

  ws.on('message', message => {
    wss.clients.forEach(client => {
      if (client !== ws && client.readyState === webSocket.OPEN) {
        client.send(message);
      }
    });
  });
});

Back on the client-side, I try to play the audio using an audio tag with reference audioEl

  webSocket.onmessage = event => {
    audioEl.src = window.URL.createObjectURL(event.data);
    audioEl.play();
  };

Now, I understand that this will only work for the first chunk of data (and it does work), because audioEl.play(); is asynchronous. In this case I am trying to change the blob URL for the audio element every second a new blob is received via WebSocket.

After researching for a week now, I found solutions only on how to stream server to client audio, start recording audio, stop recording and then send the entire chunk as a blob.

I also tried sending an AudioBuffer, but don't know how to process it back to play the audio.

const context = new AudioContext();
    const source = context.createMediaStreamSource(stream);
    const processor = context.createScriptProcessor(1024, 1, 1);

    source.connect(processor);
    processor.connect(context.destination);

    processor.onaudioprocess = function(e) {
      webSocket.send(e.inputBuffer);
    }

What I am trying to achieve is that a user speaks into his/her microphone and the audio live streams to the server and then to another user and played simultaneously.

If my approach of sending a blob every second is correct, how can I make the code work to play the audio continuously? Maybe I need to create some buffers, which I don't know about. Or if the approach is totally incorrect, guide me to a correct one, please.

Using the WebRTC technology for peer-to-peer communication is not an option for me because I don't want an overhead of STUN or a TURN server.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Akshit Mehra
  • 747
  • 5
  • 17
  • what you are saying is that you prefer always relaying data via a websocket/TCP server to sometimes relaying via a TURN (udp; typically) server. Reconsider your options. – Philipp Hancke Jul 26 '20 at 19:59

1 Answers1

3

MediaRecorder passes chunks of data to your dataavailable event handler. In order for those chunks to be useful, they must be played in order. They are chunks of a media file usually in .webm format, also known as Matroska format. They don't stand alone. (Except the first one).

So, if you pass them to another browser via a websocket payload, they really can't be played individually.

You could try to parse the webm file on your receiving browser and contrive to play it from your websocket's message events. There's a npm package called ebml that can help with that. If you go for that solution look for "how to decode opus audio in a browser." I've done this for video. It's a pain in the xxx neck to develop and debug. (I only did it because some users needed to use a Redmond Middle School Science Project -- that is, Microsoft Internet Explorer -- to render low-latency video. I could have bought all those users new computers for what it cost to develop that.)

It's strange, but true, that the way the WebRTC communications stack packetizes audio is drastically different from the way MediaRecorder does it.

(For what it's worth there's a vendor called xirsys.com that provides STUN/TURN servers. They have a generous free tier for development and low-volume work. That is worth considering. I've had good success, at the development stage, with them.)

O. Jones
  • 103,626
  • 17
  • 118
  • 172