Scrapinghub spider finishes and closes before task is done

Asked Apr 09 '20 at 23:52

Active Apr 09 '20 at 23:52

Viewed 44 times

I am using scrapinhub cloud with a splash instance to scrape content and images from a large list of urls that are provided with the spider. There are around 50 000 urls that I wish to crawl.

The first time I ran it, the spider went for just under 13 hours then closed after only scraping 11k urls. The next time I ran it, it only went for 2 hours and scraped 2k urls.

The only message I got was the following:

(TCP Port 6023 Closed)

Please let me know any possible solutions or more info I can provide

asked Apr 09 '20 at 23:52

BradleyB19

Scrapinghub guys can answer it, ask from their support, is Splash running on that port? maybe it went down? – Umair Ayub Apr 10 '20 at 06:25
Thanks for the suggestion @Umair, Splash is not running on that port. I've messaged on the scrapinghub support with no success. Is there a better forum to ask on? – BradleyB19 Apr 10 '20 at 12:40

Scrapinghub spider finishes and closes before task is done

0 Answers0