5

getresponse issues many recv calls while reading header of an HTML request. It actually issues recv for each byte which results in many system calls. How can it be optimized?

I verified on an Ubuntu machine with strace dump.

sample code:

conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD", "/index.html")
r1 = conn.getresponse()

strace dump:

sendto(3, "HEAD /index.html HTTP/1.1\r\nHost:"..., 78, 0, NULL, 0) = 78
recvfrom(3, "H", 1, 0, NULL, NULL)      = 1
recvfrom(3, "T", 1, 0, NULL, NULL)      = 1
recvfrom(3, "T", 1, 0, NULL, NULL)      = 1
recvfrom(3, "P", 1, 0, NULL, NULL)      = 1
recvfrom(3, "/", 1, 0, NULL, NULL)      = 1
...
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
nik_kgp
  • 1,112
  • 1
  • 9
  • 17

1 Answers1

3
r = conn.getresponse(buffering=True)

On Python 3.1+ there is no buffering parameter (it is default).

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • I'm getting a lot of recvfrom() reading single byte each using urllib2.urlopen. I found that urllib2 uses HTTPConnection inside, but no arguments are being passed in getreponse() call. Is there any way to get rid of enormous amount of tiny recvfrom()'s ? – Andrei Belov Sep 01 '14 at 11:01
  • @AndreiBelov Have you tried to use HTTPConnection directly and pass `buffering=True` to its getresponse() method? – jfs Sep 01 '14 at 11:07
  • @J.F.Sebastian unfortunately, it's not an option. I've just figured out that urllib2 reads 1-byte chunks only for response headers, i.e. when server starts to send body in chunked encoding, things look better: 323 recvfrom(3, "\r", 1, 0, NULL, NULL) = 1 324 recvfrom(3, "\n", 1, 0, NULL, NULL) = 1 325 recvfrom(3, " \n – Andrei Belov Sep 01 '14 at 13:44