WebRTC: Relationship between Channels, Tracks & Streams vis-a-vis RTP SSRC and RTP Sessions

Question

From Mozilla site: https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API

"A MediaStream consists of zero or more MediaStreamTrack objects, representing various audio or video tracks. Each MediaStreamTrack may have one or more channels. The channel represents the smallest unit of a media stream, such as an audio signal associated with a given speaker, like left or right in a stereo audio track."

That clarifies what a channel is.

Several recent RFCs (E.g. 8108) refer to the need to have multiple streams sent in one RTP session. Each stream is to have its own SSRC at the RTP level. In the RFC for Unified Plan too, the reference is always to a stream as the lowest level (not tracks or channels). In RFC 3550, the base RTP RFC, there is no reference to channel.

Is the RTP stream as referred in these RFCs, which suggest the stream as the lowest source of media, the same as channels as that term is used in WebRTC, and as referenced above? Is there a one-to-one mapping between channels of a track (WebRTC) and RTP stream with a SSRC?

A webcam, for example, generates a media stream, which can have a audio media track and a video media track, each track is transported in RTP packets using a separate SSRC, resulting in two SSRCs. Is that correct? Now what if there is a stereo webcam (or some such device with, lets say two microphones - channels?). Will this generate three RTP streams with three different unique SSRCs?

Is there a single RTP session for a five-tuple connection established after successful test of ICE candidates? Or can there be multiple RTP sessions over the same set of port-ip-UDP connection between peers?

Any document that clarifies this would be appreciated.

jib · Accepted Answer · 2018-12-08T16:28:11.420

That clarifies what a channel is.

Not quite. Only audio tracks have channels. Unless you use web audio to split up an audio MediaStreamTrack into individual channels, the track is the lowest level with regards to peer connections. *

That's because multiple audio channels, much like the multiple frames of video, are part of the payload that gets encoded and decoded by codecs. Practically speaking you can use a web audio splitter on the receiver's MedaiStreamTrack to split up the audio-channels, provided they survived.

*) There's also data channels, but those are different, and have no relation to media streams and tracks.

Is the RTP stream ... the same as channels as that term is used in WebRTC, and as referenced above?

No. Roughly speaking, you can say:

RTP stream == MediaStreamTrack.

But that's not the entire story, because of sender.replaceTrack(withTrack). In short, you can replace a track that's being sent with a different track anytime during a live call, without needing to renegotiate your connection. Importantly, the other side's receiver.track does not change in this case, only its output does. This separates the pipe from the content that goes through it.

So on the sending side, it's more fair to say:

RTP stream == current output a sender (from `pc.getSenders()`)

...whereas on the receiving side, it's simpler, and always true to say:

RTP stream == receiver.track

Makes sense?

What about MediaStream?

In modern WebRTC, MediaStreams are dumb containers—You may add or remove tracks from them as you please using stream.addTrack(track) and stream.removeTrack(track)—Also, RTCPeerConnection deals solely with tracks. E.g.:

for (const track of stream.getTracks()) {
  pc.addTrack(track, stream);
}

Is there a one-to-one mapping between ~~channels of~~ a track and RTP stream with a SSRC?

Between a MediaStreamTrack and SSRC, yes.

A webcam, [...] can have a audio media track and a video media track, each track is transported in RTP packets using a separate SSRC, resulting in two SSRCs. Is that correct?

Yes in this case, because audio can never be bundled with video or vice versa.

Now what if there is a stereo webcam

No difference. A stereo audio track is still a single audio track (and a single RTP stream).

Or can there be multiple RTP sessions over the same set of port-ip-UDP connection between peers?

Not at the same time. But multiple tracks can share the same session, unless you use the non-default:

new RTCPeerConnection({bundlePolicy: 'max-compat'});

If you don't, or use any other mode, then same-kind tracks may be bundled into a single RTP session.

So an "audio" channel (e.g. input from microphones placed at different locations) is just another mediaStreamTrack = rtp stream. And as multiple RTP streams can be synchronized inside an RTP session, that makes sense? Regarding your statement: "If you don't, or use any other mode, then same-kind tracks may be bundled on the same SSRC." I think that Bundled streams result in different SSRCs for each RTP stream (m= entry in SDP). These streams only share the same ICE candidates and the same RTP Session? [RFC](https://tools.ietf.org/html/draft-ietf-mmusic-sdp-bundle-negotiation-53) — asinix, Dec 08 '18 at 05:20
@RTC No, a stereo audio track is still just one track, i.e. one RTP stream. — jib, Dec 08 '18 at 05:30
May be my understanding or definition of stereo audio track is wrong. A stereo audio track, as I understand it, consists of multiple audio sources (e.g. left and right microphones). If they are part of one RTP stream and therefore share the same SSRC, how can they be split at the receiver end (for e.g. to be played on left speaker and right speaker)? If by stereo audio track u mean the specific stream from one microphone, then i get it. To understand the different RFCs, I need to get these basics clear. Thanks again... — asinix, Dec 08 '18 at 06:44
@RTC Same way frames of video are split apart at the receiver. This information is part of the payload that gets encoded and decoded by the codecs. Practically speaking you can use a web audio [splitter](https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/createChannelSplitter) on the receiver's `MedaiStreamTrack` to [split up the audio-channels](https://blog.mozilla.org/webrtc/channelcount-microphone-constraint), provided they survived. — jib, Dec 08 '18 at 12:39
I think I understand now. So the channelSplitter API (combined with possibly other APIs) is the one that may be able to split one multi-channel mediaStreamTrack into multiple mediaStreamTracks, one per channel. Had to do some reading on sampling, stereo sampling and even stereo microphones! Thought stereo input always required multiple devices, one for each channel. No wonder, all RFCs refer to devices as source of a unique SSRC of a RTP stream and never mention channel. Thanks for your detailed response. — asinix, Dec 09 '18 at 13:40

WebRTC: Relationship between Channels, Tracks & Streams vis-a-vis RTP SSRC and RTP Sessions

1 Answers1

RTP stream == MediaStreamTrack.

RTP stream == current output a sender (from `pc.getSenders()`)

RTP stream == receiver.track

What about MediaStream?

Linked

WebRTC: Relationship between Channels, Tracks & Streams vis-a-vis RTP SSRC and RTP Sessions

1 Answers1

RTP stream == MediaStreamTrack.

RTP stream == current output a sender (from pc.getSenders())

RTP stream == receiver.track

What about MediaStream?

Linked

RTP stream == current output a sender (from `pc.getSenders()`)