I have access to IBM Watson's Speech-To-Text API which allows streaming via WebSockets, and I'm able to call getUserMedia()
to instantiate a microphone device in the browser, but now I need to work out the best way to stream this information in real-time.
I intend for a three-way WebSocket connection from browser <=> my server <=> Watson
using my server as a relay for CORS reasons.
I have been looking at WebRTC and various experiments, but all of these seem to be inter-browser peer-to-peer and not client-to-server like I intend.
The only other examples (e.g. RecordRTC) I've come across are seemingly based around recording a WAV or a FLAC file from the MediaStream
returned by getUserMedia()
and then sending the file to the server, but this itself has two problems:
- The user shouldn't have to press a start or a stop button - it should just be able to listen to the user at all times.
- Even if I make a recording and stop it when there's a period of silence, there will be an unreasonable time delay between speaking and getting a response from the server.
I'm making a proof of concept and if possible, I'd like this to work on as many modern browsers as it can - but most importantly, mobile browsers. iOS seems to be out of the question on this one though.
http://caniuse.com/#feat=stream
http://caniuse.com/#search=webrtc
Lets assume I just have this code for now:
// Shimmed with https://raw.githubusercontent.com/webrtc/adapter/master/adapter.js
navigator.mediaDevices.getUserMedia({ audio: true })
.then(function (mediaStream) {
// Continuously send raw or compressed microphone data to server
// Continuously receive speech-to-text services
}, function (err) {
console.error(err);
});