0

I would like to accomplish something like this in the browser but plugin-free.

It's described in a fully functional Linux command line and in pseudocode:

$ arecord -d 5 -f dat -c 1 -f S16_LE -r 8000 |\
  speexenc - - |\
  curl -X PUT "https://api.example.com/record-sound" -T -

Pesudo-code:

  1. Open the mic,
  2. Open the connection to the API endpoint
  3. Read bytes from the mic on the rate of 8000 Hz (wave pcm signed 16-bit little endian mono 8000 Hz)
  4. Encode the bytes to Speex in chunks (http://speex.org)
  5. Write the byte chucks you get from Speex encoder to the webservice request body in chunks - this a lot of bandwidth is saved and bytes are sent as soon as they're available
  6. When mic stops giving results (either by a timer -d 5 or stopped by the user, flush the request body and send the request.
  7. Wait for JSON (or anything else) results in response body

Obviously this is a Linux command line but the algorithm have been successfully applied on:

  1. Linux - using the command above
  2. iOS - iPhone and iPad
  3. Java2SE on Windows

I would like to achieve the same goal "efficient and fast streaming from microphone to a server" on the browser (old browsers are not my concern now). It doesn't have to be a REST server, WebSockets or whatever is possible are as good. Also Node.js based solution is prefered.

What I have tried so far:

  • Successfully used a VideoIO.swf which is a Flash based solution and RTMPLite on the server to receive video. It's actually a very good solution in terms of functionality except that:
    • It requires user approval through very bad UI
    • It's not a standard way of doing web and don't allow good error detection and such
  • Successfully used getUserMedia to upload the sound through recorder.js and XHR2 it works find except that:
    • It is actually slow because XHR2 needs full data before request can be made using xhr.send(data)
    • It is even slower since I could only upload wav format which is way bigger in size compared with ogg and opus
  • Researched on the use of WebRTC but couldn't find any resource that mentioned anything other than peer-to-peer connections
Omar Al-Ithawi
  • 4,988
  • 5
  • 36
  • 47

1 Answers1

0

I've asked a related question "How can I use Opus Codec from JavaScript" to seek other alternatives, and the answers there apply here to some extent:

Brad:

Unfortunately, it isn't currently possible to access browser codecs directly from JavaScript for encoding. The only way to do it would be to utilize WebRTC and set up recording on the server. I've tried this by compiling libjingle with some other code out of Chromium to get it to run on a Node.js server... it's almost impossible.

The only thing you can do currently is send raw PCM data to your server. This takes up quite a bit of bandwidth, but you can minimize that by converting the float32 samples down to 16 bit (or 8 bit if your speech recognition can handle it).

Hopefully the media recorder API will show up soon so we can use browser codecs.

OmarIthawi:

One way to do it is to compile Opus to Emscripten and hope that your PC can handle encoding using JavaScript. Another alternative is to use speex.js.

Community
  • 1
  • 1
Omar Al-Ithawi
  • 4,988
  • 5
  • 36
  • 47