Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

1 answer

No module named _sqlite3 for scrapyd

I'm new bie to Python who is stucked in the need of recompilation of Python. I have server with Centos where I have installed Python3.6 and then scrapyd. Everything installed successfully, however when I try to run scrapyd command, it shows me below…

asked Jun 05 '18 at 12:00

Dhaval

votes

1 answer

Scrapyd pause/continue job

Looks like there is no legal way to pause/continue crawling with Scrapyd, as it was with Scrapy itself (scrapy crawl spider -s JOBDIR=jobdir/spider-1). The only resolution i've found was…

python scrapy scrapyd

asked Sep 26 '17 at 05:50

Pachvarsh

votes

2 answers

Use scrapyd job id in scrapy pipelines

I've implemented a web application that is triggering scrapy spiders using scrapyd API (web app and scrapyd are running on the same server). My web application is storing job ids returned from scrapyd in DB. My spiders are storing items in…

scrapy scrapyd

asked May 30 '17 at 14:17

mouch

votes

2 answers

Why does scrapyd throw: "'FeedExporter' object has no attribute 'slot'" exception?

I came across a situation where my scrapy code is working fine when used from command line but when I'm using the same spider after deploying (scrapy-deploy) and scheduling with scrapyd api it throws error in…

scrapy twisted scrapyd

asked May 14 '16 at 07:02

MrPandav

1,831
1
20
24

votes

2 answers

init() got an unexpected keyword argument '_job'

I am trying to use scrapyd with scrapy. When I use this the code below it works fine. import scrapy from scrapy.selector import Selector from scrapy.spiders import CrawlSpider from shikari.items import shikariItem from scrapy.http import…

python python-2.7 selenium scrapy scrapyd

asked Feb 23 '16 at 19:02

No Sssweat

votes

1 answer

Automatic Schedule Scrapy Crawler With Amazon Web Services

I have a crawler/spider using Python's Scrapy, and I want to schedule a daily crawl with it using Amazon Web Services. What I would like to do, is that every day, let's say 01:00 UTC. I want a ec2 instance to be created and launch the Scrapy spider…

python amazon-web-services amazon-ec2 scrapy scrapyd

asked Apr 03 '15 at 09:24

Marcus Lind

10,374
7
58
112

votes

1 answer

Scrapy extension: spider_closed is not called

I have an extension which attaches to spider_opened and spider_closed. The spider_opened method is correctly called, but the spider_closed method is not. I close the spider by calling the scrapyd cancel endpoint. class SpiderCtlExtension(object): …

python scrapy scrapyd

asked Feb 27 '15 at 15:10

kutschkem

7,826
3
21
56

votes

1 answer

Providing url for spider using scrapyd api

I tried something like: payload = {"project": settings['BOT_NAME'], "spider": crawler_name, "start_urls": ["http://www.foo.com"]} response = requests.post("http://192.168.1.41:6800/schedule.json", …

python http scrapy scrapyd

asked Aug 24 '14 at 07:30

timfeirg

1,426
18
37

votes

2 answers

Passing json arguments to a spider in scrapy

I should pass to a spider some parameters taken from a json file. I have read that it is possible through scrapyd using schedule.json but I don't understand how to pass the json file. Someone of you have any experience?

json scrapy parameter-passing scrapyd

asked Jul 05 '14 at 17:58

eng_mazzy

1,049
4
23
39

votes

1 answer

Scrapyd cant find the project name

I am getting an error when I try to run an existing scrapy project on scrapyd. I have a working scrapy project (url_finder) and a working spider in that project used for test purpose (test_ip_spider_1x) that simply downloads whatismyip.com. I…

scrapy scrapyd

asked Apr 01 '14 at 15:56

gpanterov

1,365
2
15
25

votes

1 answer

Scrapyd: How to set scrapyd task priority?

I have several scrapy projects. I deploy all of them with scrapyd. Some of the spiders are slow while others are fast. Now, I want to run fast spiders first. How to do?

python scrapy web-crawler scrapyd

asked Sep 18 '13 at 08:05

Zhang Jiuzhou

votes

1 answer

How to set scrapy IMAGES_STORE relative path

I am trying to set IMAGES_STORE as a relative path but i am getting error and if i am specifying IMAGES_STORE as a Full path it is working fine /home/vaibhav/scrapyprog/comparison/eScraperInterface/images Error i am getting is at link Actually it…

python scrapy scrapyd

asked Jun 28 '13 at 08:03

Vaibhav Jain

5,287
10
54
114

votes

1 answer

Enabling HttpProxyMiddleware in scrapyd

After reading the scrapy documentation, I thought that the HttpProxyMiddleware is enabled by default. But when I start a spider via scrapyd's webservice interface, HttpProxyMiddleware is not enabled. I receive the following output: 2013-02-18…

scrapy scrapyd

asked Feb 18 '13 at 21:47

digitalmonkey

votes

1 answer

Scrapy server setup

Im trying to setup a scrapyd server on AWS and am trying to access it from my local machine. So far, ive managed to get scrapyd running on the remote machine. I know its running because when I do start scrapyd i get start: Job is already running:…

scrapy scrapyd

asked Dec 17 '11 at 10:46

zsquare

9,916
6
53
87

votes

0 answers

Call to deprecated function retry_on_eintr. retry_on_eintr(check_call, [sys.executable, 'setup.py', 'clean', '-a', 'bdist_egg', '-d', d]

I have to deploy my scrapy project on scrapyd on windows server 2016. I am using the below command to deploy my project scrapyd -deploy local but it generates the following error Call to deprecated function retry_on_eintr. …

python scrapy scrapyd scrapyd-deploy

asked Sep 22 '20 at 13:24

Bilawal Ali

Prev 1 2

…

23 24 Next