Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

vote

1 answer

Custom JSON Response from Scrapy Spider Deployed via Scrapyd

I need to find a way to make my Scrapy spider return a custom JSON response. It is deployed via scrapyd using schedule.json. Schedule.json responds with JobID and Status, but I'd like to add some more data to that response. If there's a way I could…

asked Dec 06 '14 at 00:43

ChristianTL

vote

0 answers

Scrapyd S3 feed export "Connection Reset by Peer"

I'm running Scrapyd with a FEED_URI set to export to S3, but I received the following error at the very end of my scrape. Note that it successfully uploaded a few hundred kb of data to the bucket as the scrape began, then threw this error at the…

amazon-s3 scrapy boto scrapyd

asked Nov 25 '14 at 16:19

szxk

1,769
18
35

vote

2 answers

scrapyd: curl error `unknown or corrupt egg`

I'am trying to update version of my spider, i wrote: curl http://localhost:6800/addversion.json -d project=comicvn -d spider=comicvn2 -d version= 141667324 -d egg=14116674324.egg It made error : {"status"": error,"message": "ValuesError: Unkow or…

python-2.7 scrapyd

asked Nov 22 '14 at 18:02

tuancoi

vote

2 answers

Projects were not shown in scrapyd

I am new to scrapyd, I have insert the below code into scrapy.cfg file. [settings] default = uk.settings [deploy:scrapyd] url = http://localhost:6800/ project=ukmall [deploy:scrapyd2] url = http://scrapyd.mydomain.com/api/scrapyd/ username =…

scrapy scrapyd

asked Oct 03 '14 at 07:31

AnandhaKumar Radhakrishnan

vote

2 answers

Scrapyd: Writing CSV file to remote server

I'm trying to schedule a crawler on EC2 and have the output export to a csv file cppages-nov.csv, while creating a jobdir encase I need to pause the crawl, but it is not creating any files. Am I using the correct feed exports? curl…

python scrapy scrapyd

asked Sep 26 '14 at 22:46

Jason Youk

vote

1 answer

Restricting access to port 6800

I've recently set up my first Ubuntu server and installed scrapy and scrapyd. I've written a few spiders, and I've figured out how to execute the spiders through the API on port 6800. I also noticed there's a web interface there. I've also noticed…

password-protection ubuntu-14.04 scrapyd

asked Sep 05 '14 at 19:55

Chad Casey

vote

1 answer

Scrapyd Permission Denied on Deploy

I'm very new to Scrapyd, and am trying to deploy. I am running on Ubuntu 12.04 and installed the ubuntu version of Scrapyd. When I run scrapy deploy default -p pull_scrapers it returns Packing version 1407616523 Deploying to project "pull_scrapers"…

python ubuntu scrapy scrapyd

asked Aug 09 '14 at 20:41

robert

vote

1 answer

Change number of running spiders scrapyd

Hey so I have about 50 spiders in my project and I'm currently running them via scrapyd server. I'm running into an issue where some of the resources I use get locked and make my spiders fail or go really slow. I was hoping their was some way to…

python python-2.7 scrapyd scrapy

asked Jul 25 '14 at 16:27

rocktheartsm4l

2,129
23
38

vote

2 answers

Launching Scrapyd with multiple configurations

I'm trying to develop my Scrapy application using multiple configurations depending on my environment (e.g. development, production). My problem is that there are some settings that I'm not sure how to set them. For example, if I have to setup my…

python scrapy scrapyd

asked May 13 '14 at 11:30

ivangoblin

vote

2 answers

Scrapy - Load a yaml file with a relative path inside the spider

I'm trying to deploy my scrapy crawlers, but the problem is that I have a yaml file that I'm trying to load from inside the spider, this works when the spider is loaded from the shell: scrapy crawl . But when the spider is deployed…

python scrapy yaml scrapyd

asked Apr 21 '14 at 11:32

Hakim

3,225
5
37
75

vote

1 answer

Schedule spider with SCRAPYD

I'am trying to schedule a spider run, i wrote: curl http://localhost:6800/schedule.json -d project=elettronica -d spider=Prokoo return: {"status": "error", "message": "'elettronica'"} In scrapyd.log i see: 2014-04-16 17:55:16+0200…

scrapyd

asked Apr 16 '14 at 16:27

user3541994

vote

1 answer

how to optimize Scrapyd setting for 200+ spider

My scrapyd is handling 200 spiders at once daily . Yesterday, the server crashed because RAM hit its cap. I am using scrapyd default setting [scrapyd] http_port = 6800 debug = off #max_proc = 1 eggs_dir = /var/lib/scrapyd/eggs dbs_dir =…

scrapy scrapyd

asked Dec 16 '13 at 17:39

Michael Nguyen

1,691
2
18
33

vote

3 answers

Keep scrapyd running

I have scrapy and scrapyd installed on a debian machine. I log in to this server using a ssh-tunnel. I then start scrapyd by going: scrapyd Scrapyd starts up fine and I then open up another ssh-tunnel to the server and schedule my spider…

scrapy scrapyd

asked Dec 08 '13 at 19:13

user1009453

vote

1 answer

How can I automate my spider runs using scrapyd?

I know this probably seems ridiculous. I have given up on a windows scrapyd implementation and have set up a ubuntu machine and got everything working just great. I ahve 3 projects each with their own spider. I can run my spiders from the terminal…

scrapy scrapyd

asked Nov 16 '13 at 06:02

Mark

vote

2 answers

how to start scrapyd server on EC2 instance

I have setup an instance on aws. Now I want to start scrapyd on a particular port. according to documentation aptitude install scrapyd-X.YY but aptitude is not found. I have tried to installing aptitude using yum but there is no match found (may…

amazon-ec2 scrapyd aptitude

asked Oct 31 '13 at 19:05

Tasawer Nawaz

Prev 1 2 3

…

23 24 Next