Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
1
vote
0 answers

In scrapyd, how to pass FEED_URI value through schedule api

I want scrapyd to run my spider on cloud, for which I want to pass the value of FEED_URI as a parameter in scrapyd command. Currently I am using following configuration in custom setting to store the output on s3. custom_settings = { …
1
vote
1 answer

Can't add a .egg file to scrapyd addversion.json

The problem I had is I can't upload my .egg file to scrapyd using curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 egg=@scraper_app-0.0.1-py3.8.egg its returning an error message like this {"node_name":…
1
vote
1 answer

Spidermon: How to include message in telegram bot?

I am trying to implement spidermon to monitor…
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
1 answer

Scrapyd: How to cancel all jobs with one command?

I am running over 40 spiders which are until now scheduled via cron and issued via scrapy crawl Due to several reasons I am now switching to scrapyd, one of them is to be able to see which jobs are running in case I need to do maintenance and reboot…
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
1 answer

scrapyd-deploy with "deploy failed (400)"

I am trying to deploy with scrapyd-deploy to a remote scrapyd server, which failes without error message: % scrapyd-deploy …
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
1 answer

Scrapyd-Deploy: Errors due to using os path to set directory

I am trying to deploy a scrapy project via scrapyd-deploy to a remote scrapyd server. The project itself is functional and works perfectly on my local machine and on the remote server when I deploy it via git push prod to the remote server. With…
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
1 answer

Scrapyd: No active project - How to schedule spiders with scrapyd

I am trying to schedule a scrapy 2.1.0 spider with the help of scrapyd 1.2 curl --insecure http://localhost:6800/schedule.json -d project=bid -d spider=test This should in theory start the crawl for spider test within project bid. Instead it…
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
0 answers

How to check the output of uploaded egg archive on scrapyd server?

I was upload egg archive on the remote scrapyd server via the curl command they give me the following output. {"node_name": "iZwz9hlgd55ocg1gzzv0v9Z", "status": "ok", "project": "hi_there", "version": "r01", "spiders": 0} And I want to check the…
1
vote
1 answer

Add a new service to scrapyd from current project

I want to create a custom service for scrapyd, from the scrapy project but I keep getting error. I created crawler/webservice.py: from scrapyd.webservice import WsResource class BackInTime(WsResource): def render_GET(self, txtrequest): …
1
vote
1 answer

Updating Django Model After Serialization from DRF

I currently have an API endpoint that receives data from the client and kicks of a scrapy crawler job. The problem is I need to create the Job model instance, kick off the scrapy job, then update the model with the task_id returned by the scrapy…
jlahut
  • 87
  • 1
  • 7
1
vote
1 answer

Docker compose: working directory needs to be an absolute path

I have this folder structure: services/ web/ project/ api/ scrapy.cfg And this is how I build my container which needs scrapy.cfg: web: build: context: ./services/web …
8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198
1
vote
2 answers

How to locate scrapyd logs file on Linux server?

The logs file generated by scrapyd occupies too much memory. I want to find it and delete it. How can I find it?
1
vote
2 answers

unable to deploy scrapy to scrapyd server

I am trying to deploy my scrapy which connected to django project to scrapyd, but when I tried scrapyd-deploy JD -p JDSpider, it failed. It said No module named GradutionProject. It seems the scrapyd cannot detect "GradutionProject.settings" in…
Zheyuuu
  • 151
  • 1
  • 12
1
vote
2 answers

Make scrapyd overwrite files

I'm looking for a way to set up a job in scrapyd, which creates a file locally and then overwrites its content at the next crawl, instead of appending it. As far as I know, using scrapy only I could use this command scrapy crawl myspider -t json…
FabioM
  • 11
  • 2
1
vote
1 answer

Deploy Scrapy project to remote Scrapyd service error

I tried to deploy a test Scrapy project to the remote Scrapyd server. I got the following error message in client side. curl http://IP:6800/addversion.json -d project=test_project -d spider=quotes {"status": "error", "message": "'version'",…
Tester
  • 798
  • 2
  • 12
  • 32