MJPEG over HTTP Specification

Question

I was trying to create a tool to grab frames from a mjpeg stream that is transmitted over http. I did not find any specification so I looked at what wikipedia says here:

In response to a GET request for a MJPEG file or stream, the server streams the sequence of JPEG frames over HTTP. A special mime-type content type multipart/x-mixed-replace;boundary=<boundary-name> informs the client to expect several parts (frames) as an answer delimited by <boundary-name>. This boundary name is expressly disclosed within the MIME-type declaration itself.

But this doesn't seem to be very accurate in practice. I dumped some streams to find out how they behave. Most streams have the following format (where CRLF is a carriage return line feed, and a partial header are some header fields without a status line):

Status line (e.g. HTTP/1.0 200 OK) CRLF
Header fields (e.g. Cache-Control: no-cache) CRLF
Content-Type header field (e.g. Content-Type: multipart/x-mixed-replace; boundary=--myboundary) CRLF
CRLF (Denotes that the header is over)
Boundary (Denotes that the first frame is over) CRLF
Partial header fields (mostly: Content-type: image/jpeg) CRLF
CRLF (Denotes that this "partial header" is over)
Actual frame data CRLF
(Sometimes here is an optional CRLF)
Boundary
Starting again at partial header (line 6)

The first frame never contained actual image data. All of the analyzed streams had the Content-Type header, with the type set to multipart/x-mixed-replace.

But some of the streams get things wrong here:

Two Servers claimed boundary="MOBOTIX_Fast_Serverpush" but then used --MOBOTIX_Fast_Serverpush as frame delimiter.

This irritated me quite a bit so I though of an other approach to get the frames.

Since each JPEG starts with 0xFF 0xD8 as Start of Image marker and ends with 0xFF 0xD9 I could just start looking for these. This seems to be a very dirty approach and I don't really like it, but it might be the most robust one.

Before I start implementing this, are there some points I missed about MJPEG over HTTP? Is there any real specification of transmitting MJPEG over HTTP? What are the caveats when just watching for the Start and End markers of a JPEG instead of using the boundary to delimit frames?

score 23 · Accepted Answer · edited Oct 07 '21 at 07:59

this doesn't seem to be very accurate in practice.

It is very accurate in practice. You are just not handling it correctly.

The first frame never contained actual image data.

Yes, it does. There is always a starting boundary before the first MIME entity (as MIME can contain prologue data before the first entity). You are thinking that MIME boundaries exist only after each MIME entity, but that is simply not true.

I suggest you read the MIME specification, particularly RFC 2045 and RFC 2046. MIME works fine in this situation, you are just not interpreting the results correctly.

Actual frame data CRLF
(Sometimes here is an optional CRLF)
Boundary

Actually, that last CRLF is NOT optional, it is actually part of the next boundary that follows a MIME entity's data (see RFC 2046 Section 5). MIME boundaries must appear on their own lines, so a CRLF is artificially inserted after the entity data, which is especially important for data types (like images) that are not naturally terminated by their own CRLF.

Two Servers claimed boundary="MOBOTIX_Fast_Serverpush" but then used --MOBOTIX_Fast_Serverpush as frame delimiter

That is how MIME is supposed to work. The boundary specified in the Content-Type header is always prefixed with -- in the actual entity stream, and the terminating boundary after the last entity is also suffixed with -- as well.

For example:

Content-Type: multipart/x-mixed-replace; boundary="MOBOTIX_Fast_Serverpush"

--MOBOTIX_Fast_Serverpush
Content-Type: image/jpeg

<jpeg bytes>
--MOBOTIX_Fast_Serverpush
Content-Type: image/jpeg

<jpeg bytes>
--MOBOTIX_Fast_Serverpush
... and so on ...
--MOBOTIX_Fast_Serverpush--

This irritated me quite a bit so I though of an other approach to get the frames.

What you are thinking of will not work, and is not as robust as you are thinking. You really need to process the MIME stream correctly instead.

When processing multipart/x-mixed-replace, what you are supposed to do is:

read and discard the HTTP response body until you reach the first MIME boundary specified by the Content-Type response header.
then read a MIME entity's headers and data until you reach the next matching MIME boundary.
then process the entity's data as needed, according to its headers (for instance, displaying a image/jpeg entity onscreen).
if the connection has not been closed, and the last boundary read is not the termination boundary, go back to 2, otherwise stop processing the HTTP response.

MJPEG over HTTP Specification

1 Answers1