4

I'm playing around trying to write a client for a site which provides data as an HTTP stream (aka HTTP server push). However, urllib2.urlopen() grabs the stream in its current state and then closes the connection. I tried skipping urllib2 and using httplib directly, but this seems to have the same behaviour.

The request is a POST request with a set of five parameters. There are no cookies or authentication required, however.

Is there a way to get the stream to stay open, so it can be checked each program loop for new contents, rather than waiting for the whole thing to be redownloaded every few seconds, introducing lag?

Sam
  • 43
  • 1
  • 4

3 Answers3

2

You could try the requests lib.

import requests
r = requests.get('http://httpbin.org/stream/20', stream=True)

for line in r.iter_lines():
    # filter out keep-alive new lines
    if line:
        print line

You also could add parameters:

import requests
settings = { 'interval': '1000', 'count':'50' }
url = 'http://agent.mtconnect.org/sample'

r = requests.get(url, params=settings, stream=True)

for line in r.iter_lines():
    if line:
        print line
wiesson
  • 6,544
  • 5
  • 40
  • 68
1

Do you need to actually parse the response headers, or are you mainly interested in the content? And is your HTTP request complex, making you set cookies and other headers, or will a very simple request suffice?

If you only care about the body of the HTTP response and don't have a very fancy request, you should consider simply using a socket connection:

import socket

SERVER_ADDR = ("example.com", 80)

sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("r+", bufsize=0)

f.write("GET / HTTP/1.0\r\n"
      + "Host: example.com\r\n"    # you can put other headers here too
      + "\r\n")

# skip headers
while f.readline() != "\r\n":
    pass

# keep reading forever
while True:
    line = f.readline()     # blocks until more data is available
    if not line:
        break               # we ran out of data!

    print line

sock.close()
Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256
  • This works for a bit (once I got the headers for a POST request right, anyway). However, after a few seconds, the connection seems to terminate and I get a " – Sam Apr 26 '10 at 22:12
  • @Sam: The fact that you're reading ` – Eli Courtwright Apr 26 '10 at 22:17
  • There's definitely more, because it comes up in my web browser reading the same stream. However, looking at the page source, a JavaScript runs every six seconds, changing the window.location to a POST request with different parameters; specifically, it changes "rnd=0.749976718186" to a different number. I have no idea what this does, but I suspect it's related to the stream terminating early. I'll have to speak to the owner of the stream and get back to you. – Sam Apr 26 '10 at 22:31
  • Problem solved! The page I'm interfacing with requires another connection to be refreshed every 20 seconds or so or it kills the connection off because it thinks you've disconnected. Add code to grab that every few seconds and bingo, all worky. Thanks! – Sam Apr 26 '10 at 23:31
0

One way to do it using urllib2 is (assuming this site also requires Basic Auth):

 import urllib2
 p_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
 url = 'http://streamingsite.com'
 p_mgr.add_password(None, url, 'login', 'password')

 auth = urllib2.HTTPBasicAuthHandler(p_mgr)
 opener = urllib2.build_opener(auth)

 urllib2.install_opener(opener)
 f = opener.open('http://streamingsite.com')

 while True:
     data = f.readline()
rlotun
  • 7,897
  • 4
  • 28
  • 23
  • This doesn't appear to work. I dropped the auth stuff 'cos I don't need it and just used an HTTPHandler. Also added a sleep() to the loop to stop it eating too much CPU, and printing to screen if any data is encountered. It runs through the contents of the stream as they exist when the script is started, and then doesn't get any further data. – Sam Apr 26 '10 at 21:55