Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

0 answers

Python: high cpu scrapy how to debug

I'm trying to debug my cpu usage. I tried several things already: add sleep(0.1) to Pipelines disabling Pipelines Using Scrapyd with job persistance (JOBDIR paramater to save data to the disk instead of keeping it in memory), but I guess this only…

python scrapy scrapyd

asked May 21 '15 at 11:50

Erik van de Ven

4,747
6
38
80

votes

2 answers

Rename output file after scrapy spider complete

I am using Scrapy and Scrapyd to monitor certain sites. The output files are compressed jsonlines. Right after I submit a job schedule to scrapyd, I can see the output file being created and is growing as it scrapes. My problem is I can't be sure…

python scrapy scrapyd

asked May 06 '15 at 09:08

Andy

1,231
1
15
27

votes

1 answer

Implementing own scrapyd service

I want to create my own service for scrapyd API, which should return a little more information about running crawler. I get stuck at very beginning: where I should place the module, which will contain that service. If we look at default…

python scrapy scrapyd

asked Mar 09 '15 at 19:39

ilov3

votes

1 answer

Scrapyd schedule details into database

Hi I am using Scrpayd to schedule my spiders. The problem is that i want to keep track of all the historic information about the jobs scheduled so far. But if the scrapyd server re-starts all the information will be deleted. My question is is there…

python scrapy scrapyd

asked Feb 04 '15 at 09:16

backtrack

7,996
5
52
99

votes

0 answers

Can't connect to scrapyd api

I'm trying to use the scrapyd service to schedule a spider. I'm on mac os 10.9.5. I start the service by running 'Scrapyd', it's running fine and I can navigate to the interface on http://localhost:6800. But when I try to use the api, for example:…

python scrapy scrapyd

asked Jan 25 '15 at 14:31

user1009453

votes

0 answers

scrapyd job didn't finished

I used scrapyd run scrapy jobs. And the 127.0.0.1:6800 show the job is finished. But I open the log,it didn't have error message or finished info like this {'downloader/request_bytes': 1685, 'downloader/request_count': 4, …

python scrapy scrapyd

asked Jan 07 '15 at 00:17

user2492364

6,543
22
77
147

votes

1 answer

scrapyd Error on schedule new spider

I cannot schedule a spider run Deploy seems to be ok: Deploying to project "scraper" in http://localhost:6800/addversion.json Server response (200): {"status": "ok", "project": "scraper", "version": "1418909664", "spiders": 3} I scheduling a new…

scrapy scrapyd

asked Dec 18 '14 at 13:45

sergiuz

5,353
1
35
51

votes

1 answer

unable to deploy portia spider with scrapyd-deploy

Could you please help me figure out what I'm doing wrong ? Here are the steps: followed the portia install manual found here https://github.com/scrapinghub/portia - all ok created a new project, entered an url, tagged an item - all ok clicked…

python deployment scrapy scrapyd portia

asked Dec 16 '14 at 10:47

Mihai

votes

1 answer

scrapyd shared middleware and pipeline code

I have several scrapy projects that I have deployed to a scrapyd instance. They all tend to use the same middleware code that I have created and that I have duplicated amongst the projects. I would like to avoid this duplication of code. Is there a…

scrapy scrapyd

asked Dec 10 '14 at 00:35

trajan

1,093
2
12
15

votes

1 answer

pymongo.errors.ConnectionFailure: timed out from an ubuntu ec2 instance running scrapyd

So... I'm running scrapyd on my ubuntu ec2 instance after following this post: http://www.dataisbeautiful.io/deploying-scrapy-ec2/ however I guess I can't get pymongo to connect to my MongoLabs mongo database, since the ubuntu ec2 scrapyd logs are…

mongodb amazon-ec2 scrapy pymongo scrapyd

asked Nov 17 '14 at 02:33

pyramidface

1,207
2
17
39

votes

1 answer

Heavy CPU Usage by scrapy crawler

I've multiple spiders running in multiple instances (4) parallelly. All of them are using almost 100% cpu usage. I've deployed them using scrapyd. Tried changing scrapyd settings like…

python-2.7 scrapy web-crawler cpu-usage scrapyd

asked Oct 15 '14 at 21:28

Sravan

votes

1 answer

Scrapyd Error: exceptions.AttributeError: 'dict' object has no attribute 'fields'

I recently published a working scrape to scrapyd. I'm getting the error message below when I run the scrape. I reviewed this closed issue: https://github.com/scrapy/scrapy/issues/86 and implemented the recommended fix per the docs:…

python scrapy scrapyd

asked Oct 12 '14 at 17:17

dfriestedt

votes

0 answers

egg file needs permission in scrapyd while deploying

If I do this for another project means it shows. $ scrapy deploy scrapyd Packing version 1412325181 Deploying to project "project2" in http://localhost:6800/addversion.json Server response (200): {"status": "error", "message": "[Errno 13] Permission…

scrapy scrapyd

asked Oct 03 '14 at 09:09

AnandhaKumar Radhakrishnan

votes

1 answer

Schedule a spider in scrapyd and pass spider config options

I'm trying to configure spiders created with slyd to use scrapy-elasticsearch, so I'm sending -d parameter=value to configure it: curl http://localhost:6800/schedule.json -d project=myproject -d spider=myspider -d setting=CLOSESPIDER_ITEMCOUNT=100…

python scrapy scrapyd portia

asked Sep 11 '14 at 13:23

localhost

votes

4 answers

Scrapyd with Polipo and Tor

UPDATE: I am now running this command: scrapyd-deploy And getting this error: 504 Connect to localhost:8123 failed: General SOCKS server failure I am trying to deploy my scrapy spider through scrapyd-deploy, the following is the…

scrapy tor scrapyd polipo

asked Jun 13 '14 at 13:23

Moataz Elmasry

Prev 1 2 3

…

23 24 Next