Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
4
votes
1 answer

No module named _sqlite3 for scrapyd

I'm new bie to Python who is stucked in the need of recompilation of Python. I have server with Centos where I have installed Python3.6 and then scrapyd. Everything installed successfully, however when I try to run scrapyd command, it shows me below…
Dhaval
  • 901
  • 3
  • 8
  • 26
4
votes
1 answer

Scrapyd pause/continue job

Looks like there is no legal way to pause/continue crawling with Scrapyd, as it was with Scrapy itself (scrapy crawl spider -s JOBDIR=jobdir/spider-1). The only resolution i've found was…
Pachvarsh
  • 137
  • 1
  • 10
4
votes
2 answers

Use scrapyd job id in scrapy pipelines

I've implemented a web application that is triggering scrapy spiders using scrapyd API (web app and scrapyd are running on the same server). My web application is storing job ids returned from scrapyd in DB. My spiders are storing items in…
mouch
  • 335
  • 2
  • 12
4
votes
2 answers

Why does scrapyd throw: "'FeedExporter' object has no attribute 'slot'" exception?

I came across a situation where my scrapy code is working fine when used from command line but when I'm using the same spider after deploying (scrapy-deploy) and scheduling with scrapyd api it throws error in…
MrPandav
  • 1,831
  • 1
  • 20
  • 24
4
votes
2 answers

__init__() got an unexpected keyword argument '_job'

I am trying to use scrapyd with scrapy. When I use this the code below it works fine. import scrapy from scrapy.selector import Selector from scrapy.spiders import CrawlSpider from shikari.items import shikariItem from scrapy.http import…
No Sssweat
  • 358
  • 1
  • 6
  • 24
4
votes
1 answer

Automatic Schedule Scrapy Crawler With Amazon Web Services

I have a crawler/spider using Python's Scrapy, and I want to schedule a daily crawl with it using Amazon Web Services. What I would like to do, is that every day, let's say 01:00 UTC. I want a ec2 instance to be created and launch the Scrapy spider…
Marcus Lind
  • 10,374
  • 7
  • 58
  • 112
4
votes
1 answer

Scrapy extension: spider_closed is not called

I have an extension which attaches to spider_opened and spider_closed. The spider_opened method is correctly called, but the spider_closed method is not. I close the spider by calling the scrapyd cancel endpoint. class SpiderCtlExtension(object): …
kutschkem
  • 7,826
  • 3
  • 21
  • 56
4
votes
1 answer

Providing url for spider using scrapyd api

I tried something like: payload = {"project": settings['BOT_NAME'], "spider": crawler_name, "start_urls": ["http://www.foo.com"]} response = requests.post("http://192.168.1.41:6800/schedule.json", …
timfeirg
  • 1,426
  • 18
  • 37
4
votes
2 answers

Passing json arguments to a spider in scrapy

I should pass to a spider some parameters taken from a json file. I have read that it is possible through scrapyd using schedule.json but I don't understand how to pass the json file. Someone of you have any experience?
eng_mazzy
  • 1,049
  • 4
  • 23
  • 39
4
votes
1 answer

Scrapyd cant find the project name

I am getting an error when I try to run an existing scrapy project on scrapyd. I have a working scrapy project (url_finder) and a working spider in that project used for test purpose (test_ip_spider_1x) that simply downloads whatismyip.com. I…
gpanterov
  • 1,365
  • 2
  • 15
  • 25
4
votes
1 answer

Scrapyd: How to set scrapyd task priority?

I have several scrapy projects. I deploy all of them with scrapyd. Some of the spiders are slow while others are fast. Now, I want to run fast spiders first. How to do?
Zhang Jiuzhou
  • 759
  • 8
  • 22
4
votes
1 answer

How to set scrapy IMAGES_STORE relative path

I am trying to set IMAGES_STORE as a relative path but i am getting error and if i am specifying IMAGES_STORE as a Full path it is working fine /home/vaibhav/scrapyprog/comparison/eScraperInterface/images Error i am getting is at link Actually it…
Vaibhav Jain
  • 5,287
  • 10
  • 54
  • 114
4
votes
1 answer

Enabling HttpProxyMiddleware in scrapyd

After reading the scrapy documentation, I thought that the HttpProxyMiddleware is enabled by default. But when I start a spider via scrapyd's webservice interface, HttpProxyMiddleware is not enabled. I receive the following output: 2013-02-18…
digitalmonkey
  • 149
  • 1
  • 8
3
votes
1 answer

Scrapy server setup

Im trying to setup a scrapyd server on AWS and am trying to access it from my local machine. So far, ive managed to get scrapyd running on the remote machine. I know its running because when I do start scrapyd i get start: Job is already running:…
zsquare
  • 9,916
  • 6
  • 53
  • 87
3
votes
0 answers

Call to deprecated function retry_on_eintr. retry_on_eintr(check_call, [sys.executable, 'setup.py', 'clean', '-a', 'bdist_egg', '-d', d]

I have to deploy my scrapy project on scrapyd on windows server 2016. I am using the below command to deploy my project scrapyd -deploy local but it generates the following error Call to deprecated function retry_on_eintr. …
1 2
3
23 24