0

When a browser renders data that was trasfered using chunked encoding that the browser should render the origional data without the chunk size and CRLFs added to encode the data, correct?

Using this code as an example:

https://gist.github.com/josiahcarlson/3250376

My browser (Chrome and FF) renders

12
this is chunk: 0

12
this is chunk: 1

12
this is chunk: 2

12
this is chunk: 3

12
this is chunk: 4

12
this is chunk: 5

12
this is chunk: 6

12
this is chunk: 7

12
this is chunk: 8

12
this is chunk: 9

0

I was not expecting to see the chunk sizes.

Should the data be rendered with our without the endcoding information in the browser?

user2046021
  • 1
  • 1
  • 2

3 Answers3

1

HTTP 1.0 Clients are not required to decode the chunked data. The default http version sent by python's BaseHTTPServer class is HTTP 1.0. If you send a version of 1.1 the browser will render the data as you'd expect. I imagine that curl is just trying to be smart by doing the right thing even thought the server is sending the wrong protocol version.

Patch the code to set the BaseHTTPServer instance's protocol_version attribute before sending the response. Add this at line 73 of your example.

self.protocol_version = 'HTTP/1.1'

For more detailed information on the differences between HTTP 1.0 and HTTP 1.1 you can reference this http://www8.org/w8-papers/5c-protocols/key/key.html

0

The code explicitly sends that message. The generator creates these chunks:

yield "this is chunk: %s\r\n"%i

And then writes them to the socket

def write_chunk():
    tosend = '%X\r\n%s\r\n'%(len(chunk), chunk)
    self.wfile.write(tosend)

You can send whatever you want if you adapt it.

So if the chunk that is generated is "this is chunk: 0\r\n" then the write_chunk method actually sends "18\r\nthis is chunk: 0\r\n\r\n"

"\r\n" are escape sequences that mean carriage return, newline. Or windows' version of the newline. On Linux you can just use \n

aychedee
  • 24,871
  • 8
  • 79
  • 83
  • What I am not expecting to see it the chunk size and extra CRLFs – user2046021 Feb 06 '13 at 08:42
  • I'm sorry I must not have been clear with my question. I'm not having a problem understanding how the code works. I was not expecting to see the length of the chunk rendered in my browser along with the other encoding information. Is that normal behavior? – user2046021 Feb 06 '13 at 09:02
  • Then why would you not expect to see that output? The code sends newlines and chunk sizes. Have you tried modifying the write_chunk method to not send them? – aychedee Feb 06 '13 at 09:02
  • The 'output' yes, what is rendered in my browser, no. – user2046021 Feb 06 '13 at 09:04
  • I'm fairly certain that on linux you must include the \r in order for it to be properly chunked encoded. A CR (carage return = \r) LF (Line feed = \n) – user2046021 Feb 06 '13 at 09:07
  • Might be time to open up the Chrome developer console and look at the exact content of each response. Send some known good chunk encoded content and then compare against what this code is sending. – aychedee Feb 06 '13 at 09:16
  • Thanks I'll have to figure out where to find that plugin. – user2046021 Feb 06 '13 at 09:22
  • No need! In Chrome anyway it is built in. Just Ctrl+Shit+J, click on the Network tab, reload the chunked page, and then look at each response received. – aychedee Feb 06 '13 at 09:45
0

Did you specify the content encoding in the headers?

LtWorf
  • 7,286
  • 6
  • 31
  • 45
  • I set the Transfer-Encoding to "chunked" and the Content-type to "text/plain; charset=UTF-8". I did also try with the Content-encoding set to chunked but that made no difference and I did not think was correct. – user2046021 Feb 06 '13 at 08:51
  • "chunked" is not a content-encoding. – Dietrich Epp Feb 06 '13 at 11:00