I was trying to create a tool to grab frames from a mjpeg stream that is transmitted over http. I did not find any specification so I looked at what wikipedia says here:
In response to a GET request for a MJPEG file or stream, the server streams the sequence of JPEG frames over HTTP. A special mime-type content type
multipart/x-mixed-replace;boundary=<boundary-name>
informs the client to expect several parts (frames) as an answer delimited by<boundary-name>
. This boundary name is expressly disclosed within the MIME-type declaration itself.
But this doesn't seem to be very accurate in practice. I dumped some streams to find out how they behave. Most streams have the following format (where CRLF
is a carriage return line feed, and a partial header are some header fields without a status line):
Status line (e.g. HTTP/1.0 200 OK) CRLF
Header fields (e.g. Cache-Control: no-cache) CRLF
Content-Type header field (e.g. Content-Type: multipart/x-mixed-replace; boundary=--myboundary) CRLF
CRLF (Denotes that the header is over)
Boundary (Denotes that the first frame is over) CRLF
Partial header fields (mostly: Content-type: image/jpeg) CRLF
CRLF (Denotes that this "partial header" is over)
Actual frame data CRLF
(Sometimes here is an optional CRLF)
Boundary
Starting again at partial header (line 6)
The first frame never contained actual image data.
All of the analyzed streams had the Content-Type header, with the type set to multipart/x-mixed-replace
.
But some of the streams get things wrong here:
Two Servers claimed boundary="MOBOTIX_Fast_Serverpush"
but then used --MOBOTIX_Fast_Serverpush
as frame delimiter.
This irritated me quite a bit so I though of an other approach to get the frames.
Since each JPEG starts with 0xFF 0xD8
as Start of Image marker and ends with 0xFF 0xD9
I could just start looking for these. This seems to be a very dirty approach and I don't really like it, but it might be the most robust one.
Before I start implementing this, are there some points I missed about MJPEG over HTTP? Is there any real specification of transmitting MJPEG over HTTP? What are the caveats when just watching for the Start and End markers of a JPEG instead of using the boundary to delimit frames?