Our company develops a camera surveillance software and we mainly use RTSP for communication with the devices (But we support any protocol required) and we have developed our own RTSP Client and parsers
Today we were working on an integration of a new camera and we have found an interesting scenario where the camera maps the dynamic payload 96 to both audio and video packets, see the SDP description:
RTSP/1.0 200 OK
CSeq: 2
Date: Sat, Jan 01 2000 19:39:38 GMT
Content-Base: rtsp://10.1.39.174:8557/PSIA/Streaming/channels/2?videoCodecType=H.264/
Content-Type: application/sdp
Content-Length: 830
v=0
o=- 946754247689123 1 IN IP4 10.1.39.174
s=RTSP/RTP stream from IPNC
i=2?videoCodecType=H.264
t=0 0
a=tool:LIVE555 Streaming Media v2010.07.29
a=type:broadcast
a=control:*
a=range:npt=0-
a=x-qt-text-nam:RTSP/RTP stream from IPNC
a=x-qt-text-inf:2?videoCodecType=H.264
m=video 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:4000
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1;profile-level-id=64001F;sprop-parameter- sets=Z2QAKK2EBUViuKxUdCAqKxXFYqOhAVFYrisVHQgKisVxWKjoQFRWK4rFR0ICorFcVio6ECSFITk8nyfk/k/J8nm5s00IEkKQnJ5Pk/J/J+T5PNzZprQCgC3YCqQAAAMABAAAAwJZgQAB6EgAAiVQve+F4RCNQAAAAAE=,aO48sA==
a=control:track1
m=audio 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:128
a=rtpmap:96 PCMU/16000
a=control:track2
m=application 0 RTP/AVP 96
c=IN IP4 0.0.0.0
b=AS:64
a=rtpmap:96 vnd.onvif.metadata/90000
a=control:track3
As you can see:
m=video 0 RTP/AVP 96
m=audio 0 RTP/AVP 96
The problem is that we use this mapping to identify the compression from received RTP packets. I have always thought that each media would have a different mapping, like 96 for video and 97 for audio (Or even static mapping such as 0 for PCMU), but this device uses the same mapping for all medias, so, our parser will not work because it will identify the audio packets that are being received with payload 96 as video packets and will send them directly to video decoder, and of course it will not work...
I have checked that VLC works fine, but I strongly believe that VLC does not use this mapping to split the packets but it uses the channel identifications (In TCP) or the different UDP ports to identify wich packets belongs to which media.... But we have already built our architecture to split the packets depending on the payload type
So I ask... Is it right to map both audio and video to the same dynamic payload number (96)???
This is the first time that I came across this issue, and I need to know if we will have to change our whole RTSP client to identify the medias using the channels instead of Payload format or if there is an implementation bug in the camera side that they should have linked other payload numbers to each different media (96 video, 97 audio, 98 application...)
Does anyone know if such practice (using same payload number for all medias) is valid???
We have implemented the RTSP client and SDP parsers using the RFC specifications but I didn´t find anything related to using the same payload number to all medias, in all examples they always assign different payload numbers to each media...