I'm running a python scraper on my server. It needs to scrape lots of urls, so I wanted to make it faster and decided to use multiprocessing.
The problem is, that the scraping process takes a really long time and so my borwser connection times out eventually. Thus I get the [Errno 32] Broken pipe
error.
Is there anything I can do to keep the script running? Can I surpress the error and let the script continue? If I catch it, the script stop nevertheless, right?
What are my options here? Or do I have to stop using multiprocessing for time consuming scripts, that run on a server?
with Pool(5) as p:
p.starmap(download_slick_slide_html, zip(sndLinkList, repeat(mode), repeat(pathToFF),repeat(pathToBinaries), repeat(dateTime), repeat(scrapedSupplier), repeat(logfile)))
p.close()
p.join()
def download_slick_slide_html(f_snd_link_list, f_mode, f_path_to_ff, f_path_to_binaries, f_date_time,f_scraped_supplier, f_log_file):
# do some downloading here...