1

I'm currently working with a function that pulls corporate filings via a library that allows for scraping the SEC Edgar database. The issue I believe I'm having is that I'm trying to build a dataset of a few hundred companies through a loop that calls that function. Sometimes I make it through a hundred names, other times just a few dozen before getting an error. I'm getting the following error which I believe is the result of hitting the server too often?

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.sec.gov', port=443): Max retries exceeded with url: /Archives/edgar/data/899051/000089905119000007/0000899051-19-000007-index.htm (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7afd9c91d110>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

Does anyone have any tips or suggestions on how to add a sleep function and/or proxy updater that can help avoid getting throttled? If so, what are the best practices for incorporating, inside the function that scrapes the database or inside the loop that calls the function each time?

GRobs
  • 11
  • 2

1 Answers1

0
import time

pre_time = time.time()
offset = 10 #seconds

if pre_time + offset > time.time():
    my_function()
    pre_time = time.time()

Where my_function() is your function you've already made. interval is the time interval to run it (this will need to be played with to adjust for what the server will allow). Let us know if this works! :)

S_Zizzle
  • 95
  • 1
  • 12
  • 1
    Thank you for your tip. Unfortunately I'm still getting the connection error after several iterations, possibly from getting my proxy throttled. I may need to incorporate changing up my proxy. Any idea on how to add that functionality? – GRobs Jul 05 '20 at 16:39
  • @G_Roberts I'd imagine it would be best to send fewer requests, and get more information per requests. Can you change the call you are using? Even if it's a request to get everything, and then you parse through that yourself. That's all I know I'm afraid. Good luck! – S_Zizzle Jul 06 '20 at 12:40