I've got two functions that work just fine, but seem to break down when I run them nested together.
def scrape_all_pages(alphabet):
pages = get_all_urls(alphabet)
for page in pages:
scrape_table(page)
I'm trying to systematically scrape some search results. So get_all_pages()
creates a list of URLs for each letter in the alphabet. Sometimes there are thousands of pages, but that works just fine. Then, for each page, scrape_table
scrapes just the table I'm interested in. That also works fine. I can run the whole thing and it works fine, but I'm working in Scraperwiki and if I set it to run and walk away it invariably gives me a "list index out of range" error. This is definitely an issue within scraperwiki, but I'd like to find a way to zero in on the problem by adding some try/except
clauses and logging errors when I encounter them. Something like:
def scrape_all_pages(alphabet):
try:
pages = get_all_urls(alphabet)
except:
## LOG THE ERROR IF THAT FAILS.
try:
for page in pages:
scrape_table(page)
except:
## LOG THE ERROR IF THAT FAILS
I haven't been able to figure out how to generically log errors, though. Also, the above looks clunky and in my experience when something looks clunky, Python has a better way. Is there a better way?