Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
3
votes
1 answer

Scrapyd fails on depricated settings in environment variables

We run scrapy 2.1.0 and scrapyd in python 3.6 on ubuntu 18.04 and I ran into a problem that I need help understanding how to solve the right way. I'm new to python (coming from other languages) so please speak slowly and loudly so I understand…
Kalle
  • 452
  • 2
  • 4
  • 19
3
votes
1 answer

Scrapyd-Deploy: SPIDER_MODULES not found

I am trying to deploy a scrapy 2.1.0 project with scrapy-deploy 1.2 and get this error: scrapyd-deploy example /Library/Frameworks/Python.framework/Versions/3.8/bin/scrapyd-deploy:23: ScrapyDeprecationWarning: Module `scrapy.utils.http` is…
merlin
  • 2,717
  • 3
  • 29
  • 59
3
votes
1 answer

How to fix scrapy.utils.http deprecated warning

I am getting depreciation error while trying to scrapy deploy. Pretty new at this scraping. deploy.py:23: ScrapyDeprecationWarning: Module scrapy.utils.http is deprecated, Please import from `w3lib.http nstead. from scrapy.utils.http import…
Marshall
  • 37
  • 7
3
votes
2 answers

Scrapy-deploy to Scrapyd doesn't install requirements pointed in setup.py

I have a project written with Scrapy. This spider has a lot of requirements in setup.py. Here is a simple example. I run scrapyd-deploy and have the following output Packing version 1506254163 Deploying to project "quotesbot" in…
lovesuper
  • 323
  • 3
  • 12
3
votes
1 answer

Scapyd raise NotADirectoryError from .egg file

I use Scrapyd for run my spider dynamically. I add .txt file that has a list of block words. My problem is following: When I run Scrapyd server as a daemon it raised the error during scrapping: NotADirectoryError: [Errno 20] Not a directory:…
amarynets
  • 1,765
  • 10
  • 27
3
votes
1 answer

TypeError in scrapyd

I have started scrapyd in my cmd, and the website "localhost:8600" shows normally. Then I began to deploy a project named scrapyd_prac, and changed the content of project's "scrapy.cfg' as: [deploy:localhost] url = http://localhost:6800/ …
Eva Frost
  • 31
  • 2
3
votes
1 answer

Curl command with multiple -d parameters in the Paw App

Consider the following "works for me" Curl command: curl http://192.168.2.131:6800/schedule.json -d project=a -d spider=b. I have no idea how to execute this POST in Paw. The curl importer in Paw transforms this to one body parameter:…
Pullie
  • 2,685
  • 3
  • 25
  • 31
3
votes
0 answers

How to share an object instance among spiders running on scrapyd

I need to share one common object instance among crawlers / spiders running on scrapyd. The best scenario would be to hook the object's methods on each spider's signals, something like ext = CommonObject() crawler.signals.connect( ext.onSpiderOpen, …
3
votes
0 answers

Proper way to run multiple scrapy spiders

I just tried running multiple spiders in the same process using the new scrapy documentation but I am getting: AttributeError: 'CrawlerProcess' object has no attribute 'crawl' I found this SO post with the same problem so I tried using the code from…
loremIpsum1771
  • 2,497
  • 5
  • 40
  • 87
3
votes
1 answer

Scrapyd Retry if Errback in twisted deferreds / Could not open CONNECT tunnel Error

I'm working with a cycle of proxies in Scrapy, sometimes some proxies have errors and that's stopping my spider.. The error is "Could not open CONNECT tunnel" How do I change the code to retry with another proxy in case of this error? Here's the…
Xtmx
  • 53
  • 6
3
votes
0 answers

Running a new Job (run) for each start url using scrapyd

I have a two separate spider ... Spider 1 will get the list of URL from give HTML pages Spider 2 will use the scraped URL in previous spider as a Start url and start scraping the pages ..now what i am trying to do is ...i am trying to schedule it…
MrPandav
  • 1,831
  • 1
  • 20
  • 24
3
votes
2 answers

Deploy scrapyd spider with external packages

The scrapyd docs include the following note: scrapyd-deploy won’t deploy anything outside the project module... Does that mean that I cannot import from site-packages in my spiders? My spiders rely on external libraries such as MySQL-python and…
chishaku
  • 4,577
  • 3
  • 25
  • 33
3
votes
2 answers

Error when deploy scrapy project

I have scrapy 0.24, scrapyd, scrapyd-deploy packages installed from ubuntu repos. When i try to deploy my project: admin@p:~/scrapy$ sudo scrapyd-deploy -l dealernet http://localhost:6800/ admin@p:~/scrapy$ scrapy deploy -l…
TarasH
  • 55
  • 10
3
votes
2 answers

How to monitor the status of scrapy while running

I want to know what is going on while scrapy is running , how can i monitor the status ?
Spy
  • 161
  • 1
  • 8
3
votes
1 answer

Why is HttpCacheMiddleware disabled in scrapyd?

Why does HttpCachedMiddleware need scrapy.cfg and how do I work around this issue? I use scrapyd-deploy to build the egg, and deploy project to scrapyd. When the job is run, I see from the log output that the HttpCacheMiddleware is disabled because…
Sam
  • 1,288
  • 1
  • 13
  • 22