scrapyd service and periodic scraping in virtualenv

Asked Mar 15 '17 at 15:08

Active Feb 02 '21 at 10:18

Viewed 514 times

First time I installed scrapyd in Ubuntu 14.04, I didn't use the generic way.

Using apt-get, my scrapyd was considered a service that can be started and have (log/config/dbs...) dependencies however the scrapy version was very outdated.

So I installed scrapyd with pip in virtualenv. Although it is up to date, I can't start scrapyd as a service and I can't find any dependencies. Where do I create the Configuration file to link (eggs/dbs/items/log) dependencies ?

I have more than 10 spiders. Using a remote Ubuntu server, I want each spider to scrape periodically (once a weak for example) and send the data into mangodb. Most of the spiders don't have to scrape simultaneously.

What is the best approach to run scrapyd as a service and run its spiders periodically in my Ubuntu server?

edited Feb 02 '21 at 10:18

DisappointedByUnaccountableMod

6,656
4
18
22

asked Mar 15 '17 at 15:08

user2243952

Not sure I get your question or not... but you can run `scrapyd` in background ... and then schedule your spider in Cron like this `curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider` – Umair Ayub Mar 15 '17 at 15:58
run scrapyd as a background task (i have found screen to be useful) with a supervisor (supervisord) – Verbal_Kint Mar 15 '17 at 22:47

scrapyd service and periodic scraping in virtualenv

0 Answers0