Same media format for audio and video on RTSP

Question

Our company develops a camera surveillance software and we mainly use RTSP for communication with the devices (But we support any protocol required) and we have developed our own RTSP Client and parsers

Today we were working on an integration of a new camera and we have found an interesting scenario where the camera maps the dynamic payload 96 to both audio and video packets, see the SDP description:

RTSP/1.0 200 OK
CSeq: 2
Date: Sat, Jan 01 2000 19:39:38 GMT
Content-Base: rtsp://10.1.39.174:8557/PSIA/Streaming/channels/2?videoCodecType=H.264/
Content-Type: application/sdp
Content-Length: 830

v=0
o=- 946754247689123 1 IN IP4 10.1.39.174
s=RTSP/RTP stream from IPNC
i=2?videoCodecType=H.264
t=0 0
a=tool:LIVE555 Streaming Media v2010.07.29
a=type:broadcast
a=control:*
a=range:npt=0-
a=x-qt-text-nam:RTSP/RTP stream from IPNC
a=x-qt-text-inf:2?videoCodecType=H.264
m=video 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:4000
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1;profile-level-id=64001F;sprop-parameter-   sets=Z2QAKK2EBUViuKxUdCAqKxXFYqOhAVFYrisVHQgKisVxWKjoQFRWK4rFR0ICorFcVio6ECSFITk8nyfk/k/J8nm5s00IEkKQnJ5Pk/J/J+T5PNzZprQCgC3YCqQAAAMABAAAAwJZgQAB6EgAAiVQve+F4RCNQAAAAAE=,aO48sA==
a=control:track1
m=audio 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:128
a=rtpmap:96 PCMU/16000
a=control:track2
m=application 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:64
a=rtpmap:96 vnd.onvif.metadata/90000
a=control:track3

As you can see:

m=video 0 RTP/AVP 96
m=audio 0 RTP/AVP 96

The problem is that we use this mapping to identify the compression from received RTP packets. I have always thought that each media would have a different mapping, like 96 for video and 97 for audio (Or even static mapping such as 0 for PCMU), but this device uses the same mapping for all medias, so, our parser will not work because it will identify the audio packets that are being received with payload 96 as video packets and will send them directly to video decoder, and of course it will not work...

I have checked that VLC works fine, but I strongly believe that VLC does not use this mapping to split the packets but it uses the channel identifications (In TCP) or the different UDP ports to identify wich packets belongs to which media.... But we have already built our architecture to split the packets depending on the payload type

So I ask... Is it right to map both audio and video to the same dynamic payload number (96)???

This is the first time that I came across this issue, and I need to know if we will have to change our whole RTSP client to identify the medias using the channels instead of Payload format or if there is an implementation bug in the camera side that they should have linked other payload numbers to each different media (96 video, 97 audio, 98 application...)

Does anyone know if such practice (using same payload number for all medias) is valid???

We have implemented the RTSP client and SDP parsers using the RFC specifications but I didn´t find anything related to using the same payload number to all medias, in all examples they always assign different payload numbers to each media...

score 1 · Answer 1 · answered Mar 07 '13 at 02:50

This is a very nice question. From the semantics of the SDP posted by you, it appears that this camera is implementing the RTSP specification from RFC 2326 based on the presence of a=control field.

In this specification, it can be observed that each media payload has a specific control parameter attached with the first control statement being a=control:*. From Page 83 of the specification, I feel that the audio and video streams could be setup as

audio = rtsp://10.1.39.174:8557/PSIA/Streaming/channels/track2

and

video = rtsp://10.1.39.174:8557/PSIA/Streaming/channels/track1

score 0 · Answer 2 · edited Oct 07 '21 at 13:39

0

Good question, the range 96-127 is defined for dynamic payload types and RFC is not specific whether used types should be unique across multiple descriptors. Certainly, things would be more clearer if they were unique. However they don't have to as it seems. There is no mixing of the payload types because they are all defined separately under its own media announcement, that is use of video 96 and audio 96 looks valid. Not to mention that if real world devices are defining sessions this way, then RTSP clients should be ready for this.

edited Oct 07 '21 at 13:39

Community

1
1

answered Feb 26 '13 at 15:20

Roman R.

68,205
6
94
158

The thing is that I have already seen many things, and I don´t doubt that this Chinese manufacturer has some issues on RTP streaming... but it would be nice to know if the payload must be unique in the whole announcement or not – Eric Feb 26 '13 at 15:49
RFC 4566 sections 5.14 and 6 say it's "media level attribute", so I would say this SDP is valid. – Roman R. Feb 26 '13 at 15:57
I would say the original SDP is valid. The Media attribute tells you audio or video. 'm=' – Jay Mar 12 '13 at 01:45

score 0 · Answer 3 · answered Feb 26 '13 at 21:11

Above SDP is valid in my opinion. I have seen media type include same payload numbers for audio and video media channels.

Couple of ideas: 1. See if you can ask this camera to only stream audio or video independently. That way you could technically have two RTSP sessions(one for audio, one for video); that way you could know exactly what kind of RTP traffic is coming your way; and based on that information either use audio or video decoder.

If this is a really big lift on your side, check if incoming RTP packets perhaps don't have any other extra information that could allow you in infer if it is an audio or video channel.

Same media format for audio and video on RTSP

3 Answers3