1

I'm using urlopen() to open a website and pull (financial) data from it. Here is my line:

sourceCode = urlopen('xxxxxxxx').read()

After this, I then pull the data I need out. I loop through different pages on the same domain to pull data (stock info). I end the body of the loop with:

time.sleep(1)

as I'm told that keeps the site from blocking me. My program will run for a few minutes, but at some point, it stalls and quits pulling data. I can rerun it and it'll run another arbitrary amount of time and then stall.

Is there something I can do to prevent this?

Micah Cobb
  • 101
  • 2
  • 9
  • 2
    Firstly, I would look at their terms of service to see whether they allow what you're doing. Not saying you shouldn't, but it is worth considering. You could try to increase the time interval and see if the timeouts persist. – Dziugas Jul 03 '17 at 17:25
  • Hi Micah. time.sleep(1) just sleeps your program. Doesn't make the connection alive.if you want you to have to create a session using "requests.session" function. – Bhanu Tez Mar 05 '19 at 21:40

1 Answers1

-1

This worked (for most websites) for me:

If you're using the urllib.request library, you can create a Request and spoof the user agent. This might mean that they stop blocking you.

from urllib.request import Request, urlopen
req = Request(path, headers={'User-Agent': 'Mozilla/5.0})
data = urlopen(req).read()

Hope this helps

GStacey
  • 24
  • 4