Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
2
votes
1 answer

Deploying scrapy project on heroku

I have a scrapy spider project which scraps some website an fetches me the data on it. My spider yields two JSON files in which all the data is that is scraped is stored. Now, I have a flask web service which uses this above said two JSON files to…
Yogesh D
  • 1,663
  • 2
  • 23
  • 38
2
votes
4 answers

windows scrapyd-deploy is not recognized

I have install the scrapyd like this pip install scrapyd I want to use scrapyd-deploy when i type scrapyd i got this exception in cmd: 'scrapyd' is not recognized as an internal or external command, operable program or batch file.
Marco Dinatsoli
  • 10,322
  • 37
  • 139
  • 253
2
votes
1 answer

Scrapyd jobs aren't finishing

I have several spiders that I need to crawl. I am using scrapyd with default settings. I managed to schedule my jobs with scrapyd interface. Everything at this point is fine, except that jobs aren't ending. Every time I check I find that 16 (4 jobs…
Mourad
  • 21
  • 7
2
votes
1 answer

unable to deploy scrapy project

Suddenly my scrapy deployment is started getting failed : sudo scrapy deploy default -p eScraper Password: Building egg of eScraper-1372327569 'build/scripts-2.7' does not exist -- can't clean it zip_safe flag not set; analyzing archive…
Vaibhav Jain
  • 5,287
  • 10
  • 54
  • 114
2
votes
3 answers

Scrapyd deploy "Attribute Error: NoneType object has no attribute 'module_name'"

I'm trying to using scrapy to deploy my crawler project to a scrapyd instance but calling the commend returns the following error: Server response (200): {"status": "error", "message": "AttributeError: 'NoneType' object has no attribute…
The.Anti.9
  • 43,474
  • 48
  • 123
  • 161
2
votes
2 answers

How to run multiple scrapyd servers?

I have been searching for documentation on the Scrapyd Service but it is very slim. I was wondering if anyone has any idea how to set up multiple Scrapyd servers that point to the same schedule queue?
IamnotBatman
  • 342
  • 3
  • 7
2
votes
1 answer

Crawler processes dying unexpectedly

I am facing a problem with crawler processes dying unexpectedly. I am using scrapy 0.14, the problem existed in 0.12 as well . The scrapyd log shows entries like: Process died: exitstatus=None The spider logs dont show spider closed information as…
Jeff Borden
  • 1,369
  • 1
  • 19
  • 30
1
vote
1 answer

How to Change Timezone of scrapydweb?

I just Install scrapydweb in my system and my problem is I can't change the Timezone. The Default is UTC+00:00 and I want to change it to UTC+03:30. How Can I do this? I change the timezone of my Linux server to Asia/Tehran. But unfortunately this…
Ali Esmaeili
  • 153
  • 8
1
vote
0 answers

Run scrapy multiple spiders concurrently

I have 100 spiders within the same scrapy project, what is the best way to run them all concurrently, every 8hours, using some kind of a scheduler? The data scraped from all spiders are inserted into the same database table.
fractal397
  • 534
  • 1
  • 5
  • 11
1
vote
0 answers

Why does Scrapyd time out due to refused connection?

I am operating several cloud instances where scrapyd is scheduling scrapy crawlers that write to a remote db server (MySQL 8.x on Ubuntu 20.04). This worked for months. Suddenly it was not possible to deploy with scrapyd-deploy to one of the…
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
0 answers

Configuring Scrapyd + Django on Docker to use django models

I have this project with scrapy, scrapyd and django. My crawler uses the django models to add the items to the database through the pipelines. What i did was use a single container to start the scrapyd and the django server, but this give the…
1
vote
1 answer

Scrapyd + Django in Docker: HTTPConnectionPool (host = '0.0.0.0', port = 6800) error

I am a young Italian boy looking for help.I'm building a web interface for my web scraper using django and scrapyd. It's my first experience with scrapy but i'm learning fast thanks to the good amount of documentation on the net. However, I find…
1
vote
0 answers

Integrating scrapy crawler with Django

Before you mark this as duplicate, please read ahead. I have researched much and haven't found anything (although a different question with the same title). In my project, I want to take a URL from the user and scrape the URL using a Scrapy…
1
vote
1 answer

Scrapyd getting 504 Deploy failed after cloning server

I have a running Scrapyd Instance. This instance has been cloned and is now up and running under another server IP. The cloned server workes just fine, except that I can no longer deploy to the new IP. I am getting retry_on_eintr(check_call,…
merlin
  • 2,717
  • 3
  • 29
  • 59
1
vote
1 answer

HTTPCACHE not working when deploying scrapy project to scrapyd

When I try to use HTTPCACHE with scrapyd I get the following error: [scrapy] WARNING: Disabled Httpcache Middlware: unable to find scrapy.cfg file to infer project data dir
Acorn
  • 49,061
  • 27
  • 133
  • 172