2

I'm running a python scraper on my server. It needs to scrape lots of urls, so I wanted to make it faster and decided to use multiprocessing. The problem is, that the scraping process takes a really long time and so my borwser connection times out eventually. Thus I get the [Errno 32] Broken pipe error.

Is there anything I can do to keep the script running? Can I surpress the error and let the script continue? If I catch it, the script stop nevertheless, right?

What are my options here? Or do I have to stop using multiprocessing for time consuming scripts, that run on a server?

with Pool(5) as p:
    p.starmap(download_slick_slide_html, zip(sndLinkList, repeat(mode), repeat(pathToFF),repeat(pathToBinaries), repeat(dateTime), repeat(scrapedSupplier), repeat(logfile)))
    p.close()
    p.join()


def download_slick_slide_html(f_snd_link_list, f_mode, f_path_to_ff, f_path_to_binaries, f_date_time,f_scraped_supplier, f_log_file):
    # do some downloading here...
acincognito
  • 1,595
  • 1
  • 13
  • 24
  • It would be really helpful if you would post the relevant code – Avi Mosseri May 27 '18 at 03:08
  • It's basically the standard code that you find in the documents/every tutorial. – acincognito May 27 '18 at 03:12
  • I have searched around when I had this problem and tried a lot, like adding sleep time from suggestions from other thread. In the end, coincidently I reinstalled python 2.7 instead of 3.7 and the same code just works now... I don't know why but I think I might just write this down in case you or other people had the same issue as I had and wasted many hours on it... – tun Oct 18 '18 at 11:39

0 Answers0