Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

1 answer

scrapyd-deploy error: pkg_resources.DistributionNotFound

I have been trying for a long time to find a solution to the scrapyd error message: pkg_resources.DistributionNotFound: The 'idna<3,>=2.5' distribution was not found and is required by requests What I have done: $ docker pull ceroic/scrapyd $ docker…

asked Jan 16 '21 at 15:42

Vraja

votes

0 answers

How to send a list of numbers / strings with scrapyd to a spider when using scrapyd.schedule

I'm trying to start my scrapy bot from a Django application and I need to pass in a list of strings and also a list of numbers that the bot requires to function. This is my code in the views.py of my Django application: task =…

scrapy scrapyd

asked Dec 29 '20 at 04:24

sam rafiei

votes

1 answer

Send a JSON object from memory by FTP

i've deployed a spider to scrapyd. in development the spider was writing a file to disk. deployed no file is produced. I believe it is a permission problem. i'm looking to ftp the data out. so, solution 1 would be not to write a file at all. is…

python scrapy ftp scrapyd

asked Dec 23 '20 at 21:32

jim Burns

votes

1 answer

ScrapydWeb: Connection refused within docker-compose

I tried to run a couple of scrapyd services to have a simple cluster on my localhost, but only the first node works. For 2 others I get the following error scrapydweb_1 | [2020-11-17 07:17:32,738] ERROR in scrapydweb.utils.check_app_config:…

python docker docker-compose scrapy scrapyd

asked Nov 17 '20 at 07:32

amarynets

1,765
10
27

votes

1 answer

set the format of the scrapyd output file

I am using scrapy to collect data. Running spiders using scrapyd. The file with the results is added by default to /data/scrapyd/items/ {spider_name }/ {job_id }.jl job_id - Installs scrapyd. Please tell me if it is possible to manually specify…

python scrapy scrapyd

asked Nov 10 '20 at 17:02

virvaldium

votes

1 answer

Scrapy - new instance of Item Pipeline classes per process/job?

I use Scrapyd for scheduling and launching spider jobs. In Item Pipelines classes i set job specific variables into the class, which should not be shared by other spiders/jobs. So my question is, does Scrapy/Scrapyd create new instance of pipeline…

scrapy scrapyd

asked Sep 25 '20 at 13:44

Mon B.

votes

1 answer

Scrapy request chaining not working with Spider Middleware

Similar to what is done in the link: How can i use multiple requests and pass items in between them in scrapy python I am trying to chain requests from spiders like in Dave McLain's answer. Returning a request object from parse function works fine,…

python scrapy scrapyd

asked Sep 19 '20 at 11:02

Burak Yucel

votes

0 answers

Scrapy User Agents Blocked or Doesn't Work on Remote Server

I'm using Scrapy 2.3 with the library scrapy_fake_useragents to scrape a major e-commerce website. When I run the spider on my local computer, scrapy will rotate user agents per the library and will scrape the information I need, bypassing the…

python scrapy scrapyd

asked Aug 30 '20 at 04:23

Dayne Tran

votes

1 answer

Run scrapyd in Python 3.6

I've been looking around and I can't seem to find an answer on how to run scrapyd in Python 3 and above. When I run it it keeps defaulting to python 2.7, though I recall reading in the docs or elsewhere that scrapyd supports…

python-3.x scrapy scrapyd scrapyd-deploy

asked Jun 27 '20 at 00:37

Thorvald

votes

1 answer

Unable to access scrapyd interface on the server machine with public IP

I am trying to run scrapyd my ubuntu server which has a public IP using the following config file named scrapy.cfg [settings] default = web_crawler.settings [deploy:default] url = http://127.0.0.1:6800/ project = web_crawler [scrapyd] eggs_dir =…

python ubuntu scrapy scrapyd scrapinghub

asked Jun 09 '20 at 12:24

Amanda

2,013
3
24
57

votes

0 answers

Multiple scrapy projects to one scrapyd project

I have multiple scrapy spiders. For every spider I have an own scrapy project like this: Scrapy Project 1 -> spider 1 Scapys project 2 -> spider 2 When I deploy one project to scrapyd it works fine and says there is one spider. But when I try to…

python scrapy scrapyd scrapyd-deploy

asked Jun 03 '20 at 14:15

CIC3RO

votes

1 answer

Scrapyd: How to write data to json file?

I have a working scrapy 2.1.0 project where I write data to a json file: def open_spider(self, spider): self.file = open('data/'+ datetime.datetime.now().strftime ("%Y%m%d") + '_' + spider.name + '.json', 'wb') self.exporter =…

python scrapy scrapyd

asked May 18 '20 at 20:01

merlin

2,717
3
29
59

votes

1 answer

scrapyd stops after one second without error messages in logfile

I am running scrapyd 1.2 with scrapy version 2.1 and suddenly the daemon stoped working properly. It will schedule jobs, but they end after one second to status "finished" and the log file of this spider shows this as the last line: 2020-05-17…

scrapy scrapyd

asked May 17 '20 at 06:05

merlin

2,717
3
29
59

votes

1 answer

How to retrieve scrpy job id within method?

I am trying to get the job id of a scrapy 2.1.x job on spider_close method: class mysql_pipeline(object): import os def test: print(os.environ['SCRAPY_JOB']) Unfortunatelly this results in a key error: ERROR: Scraper close…

scrapy scrapyd

asked May 16 '20 at 15:34

merlin

2,717
3
29
59

votes

1 answer

Scrapyd: How to retrieve spiders or version of a scrapyd project?

It apears that either the documentation of scrapyd is wrong or that there is a bug. I want to retrieve the list of spiders from a deployed project. the docs tell me to do it this way: curl http://localhost:6800/listspiders.json?project=myproject So…

scrapy scrapyd

asked May 09 '20 at 05:44

merlin

2,717
3
29
59

Prev 1 2 3

…

23 24 Next