Currently, I'm working on receive tcp stream and analyse HTTP data by python. I have already learned about how to decode chunked data at here. My problem is: when I hold whole HTTP response and start to decoded it, but prefix chunk size is quite smaller than actual size.I would show below:
This is pure data I've received:
b'000096F6\r\n<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml" prefix="og: http://opengraphprotocol.org/schema/ fb: http://www.facebook.com/2010/fbml d: http://dictionary.com/2011/dml">\n<head>\n<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>\n<base href="http://dictionary.reference.com/">\n<title>Search | Define Search at Dictionary.com</title>\n<script.....(more data)
You could see the prefix size is (hex)96F6 = 38646 (bytes)
But if I split data by this algorithm:
encoded = row_data;
new_data = ""
while encoded != '':
off = int(encoded[:encoded.index('\r\n')], 16)
if off == 0:
break
encoded = encoded[encoded.index('\r\n') + 2:]
new_data = new_data.__add__(encoded[:off])
encoded = encoded[off + 2:]
return new_data
I could just obtain two damaged group:
(more data).....<div class="dot dot-left dot-bottom "></
and
v>
<div class="language-name oneClick-disabled">.....(more data)
So it through me an exception that could not get off in next loop. As I carefully inspected response body, I got len(data) is 78543 and len(data.decode()) is 78503, and whole response just have only one chunk!
Then I tried lots of web set and they all have this problem.
So, my question is: what's wrong with me? How to correctly decode this type of data? Thanks for someone who can provide help!