Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

vote

1 answer

scrapy_splash.SplashRequest doesn't execute callback function when scheduled by scrapyd

I did encounter some strange behaviour (to my perspective of knowledge) of SplashRequest's callback when it's executed by scrapyd. Scrapy Sourcecode from scrapy.spiders.Spider import Spider from scrapy import Request import scrapy from scrapy_splash…

scrapy scrapyd scrapy-splash

asked Jan 29 '17 at 14:31

David Lassig

vote

1 answer

Update spider code controlled by scrapyd

What is the proper way to install/activate a spider that is controlled by scrapyd? I install a new spider version using scrapyd-deploy; a job is currently running. Do I have to stop the job using cancel.json, then schedule a new job?

scrapy scrapyd

asked Jan 10 '17 at 11:38

Markus

2,412
29
28

vote

3 answers

How to make the .sh file should be in running state always

I'm new to shell scripting, i want the command to be in running always. My .sh file - startscrapy.sh #!/bin/bash echo "Scrapyd is started now" scrapyd i have changed the permission also chmod +x etc/init.d/startscrapy.sh I have placed this file…

python scrapyd

asked Nov 20 '16 at 08:45

Vimal Annamalai

vote

0 answers

Portia spider not crawling items

I have created a spider using Portia UI and I have deployed and scheduled in one of my virtual machine using scrapyd. Spider ran fine and scraped website contents. But when I try to deploy and schedule the same spider in another similar virtual…

python scrapy scrapyd portia scrapinghub

asked Nov 02 '16 at 13:25

Prabhakar

1,138
2
14
30

vote

2 answers

Securing scrapyd's APIs and Web Interface

I have setup Scrapyd to manage Scrapy spiders in a better way and it is doing that really fine. I am just doubtful about how to secure it as I fear anyone who gets to know that this is a Scrapyd server can use the APIs to manipulate the working of…

python scrapy scrapyd

asked Jun 29 '16 at 08:47

harkirat1892

vote

0 answers

Scrapyd on Heroku can't recognize rewritten DATABASE_URL by heroku-buildpack-pgbouncer

Okay, here is my setup. I'm on Heroku running a scrapyd daemon using the scrapy-heroku package https://github.com/dmclain/scrapy-heroku. I'm having issues running out of database connections. I decided to try pooling the database connections use…

heroku scrapy twisted scrapyd pgbouncer

asked May 20 '16 at 22:24

jeffjv

3,461
2
21
28

vote

2 answers

scrapyd - error while running spiders simultaneously

I'm trying to run two scrapy spiders simultaneously using scrapyd. I execute curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider result - {"status": "ok", "jobid":…

python scrapy scrapyd

asked May 17 '16 at 09:36

Rainmaker

10,294
9
54
89

vote

1 answer

Disable Scrapyd item storing in .jl feed

Question I want to know how to disable Item storing in scrapyd. What I tried I deploy a spider to the Scrapy daemon Scrapyd. The deployed spider stores the spidered data in a database. And it works fine. However Scrapyd logs each scraped Scrapy…

scrapy scrapyd

asked Apr 24 '16 at 15:12

Pullie

2,685
3
25
31

vote

1 answer

scrapyd multiple spiders writing items to same file

I have scrapyd server with several spiders running at same time, I start the spiders one by one using the schedule.json endpoint. All spiders are writing contents on common file using a pipeline class JsonWriterPipeline(object): def __init__(self,…

scrapy scrapyd scrapy-pipeline

asked Mar 23 '16 at 15:47

silvestrelosada

vote

1 answer

Scrapy DEPTH_PRIORITY don't work

I would like my Spider Crawl the start_urls website entirely before follow more deeply the websites. The crawler aim to find expired domains. For exemple I create a page with 500 urls (450 expired & 50 actif websites), the crawler must insert in…

python web-scraping scrapy scrapyd

asked Mar 17 '16 at 14:52

Pixel

vote

1 answer

on deploying egg file in scrapyd server then {"status": "error", "message": "IndexError: list index out of range"}

Deploying to project "projectname" in http://127.0.0.1:6800/addversion.json Server response (200): {"status": "error", "message": "IndexError: list index out of range"} when I create egg file , and deploy in scrapyd server , then such kind of error…

python scrapyd

asked Mar 07 '16 at 07:55

Pythonsguru

vote

0 answers

how to make your scrapy spiders deploy for long term running

I'm building a scraper with Scrapy framework in order to scrap a webshop.This webshop has several cat and subcat I finished already the spider and it works like a charm. I currently use it by using the start url =[] parameter for the spider (…

python scrapy scrapyd

asked Feb 25 '16 at 09:58

Andronaute

vote

2 answers

Fail to scrapyd-deploy

Traceback (most recent call last): File "/usr/local/bin/scrapyd-deploy", line 273, in main() File "/usr/local/bin/scrapyd-deploy", line 95, in main egg, tmpdir = _build_egg() File "/usr/local/bin/scrapyd-deploy", line 240, in…

python scrapy scrapyd

asked Feb 01 '16 at 07:05

wyuan

vote

1 answer

Install old version of scrapyd

I tried various way to install old version of scrapyd but not succeed.. sudo pip install scrapyd-0.24.6 sudo apt-get install scrapyd-0.24.6 pls tell me how I can download and install scpecfic version of scrapyd Thanks

scrapyd

asked Jan 11 '16 at 09:23

Pythonsguru

vote

1 answer

Scrapyd vs Windows Task Scheduler

I want to run a small set of Scrapy spiders on an Azure virtual machine. I'm looking for an automation solution. For the time being it seems like Windows Task Scheduler will do the job for running 3-5 spiders on one vm instance. The only concern I…

azure scrapy scheduled-tasks scrapyd

asked Nov 30 '15 at 16:12

Turo

1,537
2
21
42

Prev 1 2 3

…

23 24 Next