0

I don't clear about how to count `Content-Length' header in HTTP.

Take an example,

HEADER
...
Content-Type: text/html
(blank line `\r\n')
<html></html>
(blank line `\r\n')

This is a working http request sending an empty HTML page(correct me if any problem :-)). Then what should be the length of content? 15 or 17(take the blank line between header and sending entity into account)?

Thanks in advance. Best regards.

powtac
  • 40,542
  • 28
  • 115
  • 170
Summer_More_More_Tea
  • 12,740
  • 12
  • 51
  • 83

2 Answers2

4

According to W3 Content-Lentgth is defined as followed:

The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET.

As far as I understand it, you have to count everything after the first line break. My answer to your question would be 15 then.

powtac
  • 40,542
  • 28
  • 115
  • 170
  • Thanks for the fast reply. Since I'm now receiving data from a `keep-alive` connection, so I think I'd better extract the `Content-Length` field as a counter and reading specified bytes of data starting from entity. Unfortunately, when the stream ends, the counter is 2 instead of 0. I can't figure it out, and I think the additive 2 is for the blank line between header and entity, but I can't find any documents rectify my assumption. – Summer_More_More_Tea Sep 05 '11 at 16:01
  • 2
    You should definately NOT be hard-coding an offset. Read the headers, skip the blank line and line break foLlowing the headers, then read however many bytes the `Content-Length` header says to read. Also keep in mind that some responses may use a `Transfer-Encoding: chunked` header instead of a `Content-Length` header, so be prepared for that, as well as responses that use a disconnect instead of either header at all. Read RFC 2616, it explains how to handle an entity length correctly. – Remy Lebeau Sep 06 '11 at 01:01
2

15 is the correct answer. That counts the line break at the END of the entity data, which means that line break is part of the entity, not the http protocol. DO NOT count the line break between the headers and entity.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Good explanation! The first line break is part of the HTTP specification, therefor do not count it. – powtac Sep 05 '11 at 21:25
  • Sorry, *\r\n* (the one at the end of the entity body) counts 2 bytes, right? If I analyse the request body with software like wireshark, \r\n counts two bytes, 0d 0a in HEX value, but if I export those bytes into a file I see a ^M insted of the \r\n character, and it counts 1 byte only, so how should I handle this? – tonix Dec 27 '14 at 09:33
  • Yes, `\r\n` is 2 bytes, `0d 0a`. `^M` is just how some text editors display `0d` when it is by itself without a trailing `0a`. If you see 2 bytes in the capture, but only 1 byte is being exported, then the export is faulty. That has nothing to do with the HTTP protocol itself – Remy Lebeau Dec 27 '14 at 10:13