Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

1 answer

Scrapyd fails on depricated settings in environment variables

We run scrapy 2.1.0 and scrapyd in python 3.6 on ubuntu 18.04 and I ran into a problem that I need help understanding how to solve the right way. I'm new to python (coming from other languages) so please speak slowly and loudly so I understand…

scrapy scrapyd

asked May 22 '20 at 12:46

Kalle

votes

1 answer

Scrapyd-Deploy: SPIDER_MODULES not found

I am trying to deploy a scrapy 2.1.0 project with scrapy-deploy 1.2 and get this error: scrapyd-deploy example /Library/Frameworks/Python.framework/Versions/3.8/bin/scrapyd-deploy:23: ScrapyDeprecationWarning: Module `scrapy.utils.http` is…

scrapy scrapyd scrapyd-deploy

asked May 06 '20 at 22:08

merlin

2,717
3
29
59

votes

1 answer

How to fix scrapy.utils.http deprecated warning

I am getting depreciation error while trying to scrapy deploy. Pretty new at this scraping. deploy.py:23: ScrapyDeprecationWarning: Module scrapy.utils.http is deprecated, Please import from `w3lib.http nstead. from scrapy.utils.http import…

scrapyd

asked Jul 31 '19 at 01:20

Marshall

votes

2 answers

Scrapy-deploy to Scrapyd doesn't install requirements pointed in setup.py

I have a project written with Scrapy. This spider has a lot of requirements in setup.py. Here is a simple example. I run scrapyd-deploy and have the following output Packing version 1506254163 Deploying to project "quotesbot" in…

python setuptools scrapy scrapyd

asked Sep 24 '17 at 12:09

lovesuper

votes

1 answer

Scapyd raise NotADirectoryError from .egg file

I use Scrapyd for run my spider dynamically. I add .txt file that has a list of block words. My problem is following: When I run Scrapyd server as a daemon it raised the error during scrapping: NotADirectoryError: [Errno 20] Not a directory:…

python python-3.x egg scrapyd

asked Aug 31 '17 at 10:28

amarynets

1,765
10
27

votes

1 answer

TypeError in scrapyd

I have started scrapyd in my cmd, and the website "localhost:8600" shows normally. Then I began to deploy a project named scrapyd_prac, and changed the content of project's "scrapy.cfg' as: [deploy:localhost] url = http://localhost:6800/ …

python scrapyd

asked Jul 23 '17 at 07:10

Eva Frost

votes

1 answer

Curl command with multiple -d parameters in the Paw App

Consider the following "works for me" Curl command: curl http://192.168.2.131:6800/schedule.json -d project=a -d spider=b. I have no idea how to execute this POST in Paw. The curl importer in Paw transforms this to one body parameter:…

curl scrapy scrapyd paw-app

asked Apr 03 '16 at 15:33

Pullie

2,685
3
25
31

votes

0 answers

How to share an object instance among spiders running on scrapyd

I need to share one common object instance among crawlers / spiders running on scrapyd. The best scenario would be to hook the object's methods on each spider's signals, something like ext = CommonObject() crawler.signals.connect( ext.onSpiderOpen, …

python twisted scrapy scrapyd

asked Jul 13 '15 at 12:47

adaptivegarage

votes

0 answers

Proper way to run multiple scrapy spiders

I just tried running multiple spiders in the same process using the new scrapy documentation but I am getting: AttributeError: 'CrawlerProcess' object has no attribute 'crawl' I found this SO post with the same problem so I tried using the code from…

python scrapy scrapyd

asked Jul 08 '15 at 20:36

loremIpsum1771

2,497
5
40
87

votes

1 answer

Scrapyd Retry if Errback in twisted deferreds / Could not open CONNECT tunnel Error

I'm working with a cycle of proxies in Scrapy, sometimes some proxies have errors and that's stopping my spider.. The error is "Could not open CONNECT tunnel" How do I change the code to retry with another proxy in case of this error? Here's the…

proxy scrapy deferred scrapyd proxytunnel

asked Jul 08 '15 at 10:34

Xtmx

votes

0 answers

Running a new Job (run) for each start url using scrapyd

I have a two separate spider ... Spider 1 will get the list of URL from give HTML pages Spider 2 will use the scraped URL in previous spider as a Start url and start scraping the pages ..now what i am trying to do is ...i am trying to schedule it…

python scrapy multiprocessing scrapyd

asked May 16 '15 at 10:59

MrPandav

1,831
1
20
24

votes

2 answers

Deploy scrapyd spider with external packages

The scrapyd docs include the following note: scrapyd-deploy won’t deploy anything outside the project module... Does that mean that I cannot import from site-packages in my spiders? My spiders rely on external libraries such as MySQL-python and…

python scrapy scrapyd

asked Mar 14 '15 at 13:42

chishaku

4,577
3
25
33

votes

2 answers

Error when deploy scrapy project

I have scrapy 0.24, scrapyd, scrapyd-deploy packages installed from ubuntu repos. When i try to deploy my project: admin@p:~/scrapy$ sudo scrapyd-deploy -l dealernet http://localhost:6800/ admin@p:~/scrapy$ scrapy deploy -l…

python scrapy web-crawler scrapyd

asked Oct 24 '14 at 09:56

TarasH

votes

2 answers

How to monitor the status of scrapy while running

I want to know what is going on while scrapy is running , how can i monitor the status ?

scrapy scrapyd

asked Oct 13 '14 at 09:02

Spy

votes

1 answer

Why is HttpCacheMiddleware disabled in scrapyd?

Why does HttpCachedMiddleware need scrapy.cfg and how do I work around this issue? I use scrapyd-deploy to build the egg, and deploy project to scrapyd. When the job is run, I see from the log output that the HttpCacheMiddleware is disabled because…

python scrapy scrapyd

asked Jun 09 '14 at 03:04

Sam

1,288
1
13
22

Prev 1 2 3

…

23 24 Next