I am building a WebRTC application in which users can share their camera and their screen. When a client receives a stream/track, it needs to know whether it is a camera stream or a screen recording stream. This distinction is obvious at the sending end, but the distinction is lost by the time the tracks reach the receiving peer.
Here's some sample code from my application:
// Note the distinction between streams is obvious at the sending end.
const localWebcamStream = await navigator.mediaDevices.getUserMedia({ ... });
const screenCaptureStream = await navigator.mediaDevices.getDisplayMedia({ ... });
// This is called by signalling logic
function addLocalTracksToPeerConn(peerConn) {
// Our approach here loses information because our two distinct streams
// are added to the PeerConnection's homogeneous bag of streams
for (const track of screenCaptureStream.getTracks()) {
peerConn.addTrack(track, screenCaptureStream);
}
for (const track of localWebcamStream.getTracks()) {
peerConn.addTrack(track, localWebcamStream);
}
}
// This is called by signalling logic
function handleRemoteTracksFromPeerConn(peerConn) {
peerConn.ontrack = ev => {
const stream = ev.streams[0];
if (stream is a camera stream) { // FIXME how to distinguish reliably?
remoteWebcamVideoEl.srcObject = stream;
}
else if (stream is a screen capture) { // FIXME how to distinguish reliably?
remoteScreenCaptureVideoEl.srcObject = stream;
}
};
}
My ideal imaginary API would allow adding a .label
to a track or stream, like this:
// On sending end, add arbitrary metadata
track.label = "screenCapture";
peerConn.addTrack(track, screenCaptureStream);
// On receiving end, retrieve arbitrary metadata
peerConn.ontrack = ev => {
const trackType = ev.track.label; // get the label when receiving the track
}
But this API does not really exist.
There is a MediaStreamTrack.label
property,
but it's read-only, and not preserved in transmission.
By experimentation,
the .label property at the sending end is informative (e.g. label: "FaceTime HD Camera (Built-in) (05ac:8514)"
).
But at the receiving end, the .label
for the same track is is not preserved.
(It appears to be replaced with the .id
of the track - in Chrome, at least.)
This article by Kevin Moreland describes the same problem, and recommends a mildly terrifying solution: munge the SDP on the sending end, and then grep the SDP on the receiving end. But this solution feels very fragile and low-level.
I know there is a MediaStreamTrack.id
property.
There is also a MediaStream.id
property.
Both of these appear to be preserved in transmission.
This means I could send the metadata in a side-channel,
such as the signalling channel or a DataChannel
.
From the sending end, I would send { "myStreams": { "screen": "<some stream id>", "camera": "<another stream id>" } }
.
The receiving end would wait until it has both the metadata and the stream before displaying anything.
However, this approach introduces a side-channel (and inevitable concurrency challenges associated with that),
where a side-channel feels unnecessary.
I'm looking for an idiomatic, robust solution. How do I label/identify MediaStreams at the sending end, so that the receiving end knows which stream is which?