Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
0
votes
1 answer

scrapyd-deploy error: pkg_resources.DistributionNotFound

I have been trying for a long time to find a solution to the scrapyd error message: pkg_resources.DistributionNotFound: The 'idna<3,>=2.5' distribution was not found and is required by requests What I have done: $ docker pull ceroic/scrapyd $ docker…
Vraja
  • 97
  • 2
  • 6
0
votes
0 answers

How to send a list of numbers / strings with scrapyd to a spider when using scrapyd.schedule

I'm trying to start my scrapy bot from a Django application and I need to pass in a list of strings and also a list of numbers that the bot requires to function. This is my code in the views.py of my Django application: task =…
sam rafiei
  • 51
  • 1
  • 6
0
votes
1 answer

Send a JSON object from memory by FTP

i've deployed a spider to scrapyd. in development the spider was writing a file to disk. deployed no file is produced. I believe it is a permission problem. i'm looking to ftp the data out. so, solution 1 would be not to write a file at all. is…
jim Burns
  • 11
  • 3
0
votes
1 answer

ScrapydWeb: Connection refused within docker-compose

I tried to run a couple of scrapyd services to have a simple cluster on my localhost, but only the first node works. For 2 others I get the following error scrapydweb_1 | [2020-11-17 07:17:32,738] ERROR in scrapydweb.utils.check_app_config:…
amarynets
  • 1,765
  • 10
  • 27
0
votes
1 answer

set the format of the scrapyd output file

I am using scrapy to collect data. Running spiders using scrapyd. The file with the results is added by default to /data/scrapyd/items/ {spider_name }/ {job_id }.jl job_id - Installs scrapyd. Please tell me if it is possible to manually specify…
virvaldium
  • 226
  • 3
  • 13
0
votes
1 answer

Scrapy - new instance of Item Pipeline classes per process/job?

I use Scrapyd for scheduling and launching spider jobs. In Item Pipelines classes i set job specific variables into the class, which should not be shared by other spiders/jobs. So my question is, does Scrapy/Scrapyd create new instance of pipeline…
Mon B.
  • 1
0
votes
1 answer

Scrapy request chaining not working with Spider Middleware

Similar to what is done in the link: How can i use multiple requests and pass items in between them in scrapy python I am trying to chain requests from spiders like in Dave McLain's answer. Returning a request object from parse function works fine,…
0
votes
0 answers

Scrapy User Agents Blocked or Doesn't Work on Remote Server

I'm using Scrapy 2.3 with the library scrapy_fake_useragents to scrape a major e-commerce website. When I run the spider on my local computer, scrapy will rotate user agents per the library and will scrape the information I need, bypassing the…
0
votes
1 answer

Run scrapyd in Python 3.6

I've been looking around and I can't seem to find an answer on how to run scrapyd in Python 3 and above. When I run it it keeps defaulting to python 2.7, though I recall reading in the docs or elsewhere that scrapyd supports…
Thorvald
  • 546
  • 6
  • 18
0
votes
1 answer

Unable to access scrapyd interface on the server machine with public IP

I am trying to run scrapyd my ubuntu server which has a public IP using the following config file named scrapy.cfg [settings] default = web_crawler.settings [deploy:default] url = http://127.0.0.1:6800/ project = web_crawler [scrapyd] eggs_dir =…
Amanda
  • 2,013
  • 3
  • 24
  • 57
0
votes
0 answers

Multiple scrapy projects to one scrapyd project

I have multiple scrapy spiders. For every spider I have an own scrapy project like this: Scrapy Project 1 -> spider 1 Scapys project 2 -> spider 2 When I deploy one project to scrapyd it works fine and says there is one spider. But when I try to…
CIC3RO
  • 13
  • 4
0
votes
1 answer

Scrapyd: How to write data to json file?

I have a working scrapy 2.1.0 project where I write data to a json file: def open_spider(self, spider): self.file = open('data/'+ datetime.datetime.now().strftime ("%Y%m%d") + '_' + spider.name + '.json', 'wb') self.exporter =…
merlin
  • 2,717
  • 3
  • 29
  • 59
0
votes
1 answer

scrapyd stops after one second without error messages in logfile

I am running scrapyd 1.2 with scrapy version 2.1 and suddenly the daemon stoped working properly. It will schedule jobs, but they end after one second to status "finished" and the log file of this spider shows this as the last line: 2020-05-17…
merlin
  • 2,717
  • 3
  • 29
  • 59
0
votes
1 answer

How to retrieve scrpy job id within method?

I am trying to get the job id of a scrapy 2.1.x job on spider_close method: class mysql_pipeline(object): import os def test: print(os.environ['SCRAPY_JOB']) Unfortunatelly this results in a key error: ERROR: Scraper close…
merlin
  • 2,717
  • 3
  • 29
  • 59
0
votes
1 answer

Scrapyd: How to retrieve spiders or version of a scrapyd project?

It apears that either the documentation of scrapyd is wrong or that there is a bug. I want to retrieve the list of spiders from a deployed project. the docs tell me to do it this way: curl http://localhost:6800/listspiders.json?project=myproject So…
merlin
  • 2,717
  • 3
  • 29
  • 59