Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

vote

0 answers

Scrapyd die after specified time deployed on server

I have a spider deployed successfully on localhost and running day after day without any trouble When deployed to my ubuntu server, process started and spider run, but after a short time, my scrapyd process stoped without give me a clue I've…

ubuntu scrapy scrapyd

asked Oct 22 '18 at 01:58

xuke

vote

1 answer

Dynamic DEPTH_LIMIT as parameter in Scrapy, passed from Scrapyd

I am currently using Scrapyd to start a crawling spider and the DEPTH_LIMIT setting is set in the Scrapy App settings. I was wondering how to pass the depth_limit as a parameter in Scrapyd, allowing me to set it "dynamically" as requested by the…

python scrapy scrapyd

asked Jul 23 '18 at 14:06

Nicolò Gasparini

2,228
2
24
53

vote

1 answer

Scrapyd jobs not starting

I integrated scrapy in my Django project following this guide Unfortunately, In any way I try, the spider jobs are not starting, even if schedule.json gives me a jobid in return. My views: @csrf_exempt @api_view(['POST']) def crawl_url(request): …

python django scrapy scrapyd

asked Jun 12 '18 at 15:41

Nicolò Gasparini

2,228
2
24
53

vote

1 answer

How to skip Parent directories while scraping a File Type Website?

While scraping through a Basic Folder System Website that uses Directories to store file, yield scrapy.Request(url1, callback=self.parse) follows the links and scrapes all the content of the crawled link, but I'm usually encountered with the…

web-crawler scrapy google-crawlers scrapyd

asked May 30 '18 at 18:37

Nitish Tripathi

vote

1 answer

Get the response if site didn't crawl due to robots.txt

I'm trying to crawl user defined websites but not able to crawl the site where robots.txt is preventing the crawling. That's fine but I want to get the response where I can show to user that "the site you have entered doesn't allow to crawl due to…

python scrapy scrapyd

asked May 29 '18 at 15:17

Dhaval

vote

0 answers

Modules folder in Scrapinghub

I'm currently using Scrapinghub's Scrapy Cloud to host my 12 spiders (and 12 differnet projects). I'd like to have one folder with functions that are used by all 12 spiders but not sure what the best way to implement it without having 1 functions…

scrapy scrapyd scrapinghub

asked Apr 02 '18 at 11:35

Axel Eriksson

vote

0 answers

Speed up scrapy spiders initialisation time

I have multiple Scrapy spiders that I need to run at the same time every 5 minutes. The issue is that they take almost 30 sec to 1 minute to start. It's seem that they all start their own twisted engine, and so it take a lot of time. I've look into…

scrapy scrapyd

asked Feb 26 '18 at 07:57

fast_cen

1,297
3
11
28

vote

1 answer

How to set max_proc_per_cpu in Scrapyd

I have the following two Scrapy projects with the following configurations The Project1's scrapy.cfg [settings] default = Project1.settings [deploy] url = http://localhost:6800/ project = Project1 [scrapyd] eggs_dir = eggs logs_dir =…

python scrapy scrapyd

asked Dec 09 '17 at 20:17

Yuseferi

7,931
11
67
103

vote

0 answers

2 RabbitMQ workers and 2 Scrapyd daemons running on 2 local Ubuntu instances, in which one of the rabbitmq worker is not working

I am currently working on building "Scrapy spiders control panel" in which I am testing this existing solution available on [Distributed Multi-user Scrapy Spiders Control Panel]…

django scrapy rabbitmq scrapyd

asked Sep 11 '17 at 01:15

StayFoolish StayHungry

vote

0 answers

Scrapyd, Celery and Django running with Supervisor - GenericHTTPChannellProtocol Error

I'm using a project called Django Dynamic Scraper to build a basic web scraper on top of Django. Everything works find in development but when setting up on my Digital Ocean VPS I run into issues. I'm using Supervisor to keep three things…

python django scrapy supervisord scrapyd

asked Aug 29 '17 at 09:08

Dean Sherwin

vote

0 answers

How do I add the same scrapy pipeline to any spider in scrapyd

I have several projects running in scrapyd and all uses the same pipeline, so How can I add this pipeline to every scheduled spider as default with out adding anything to the curl request, only having a flag in default_scrapyd.conf file?

scrapy scrapyd

asked Aug 08 '17 at 22:06

Jgaldos

vote

0 answers

Why does Scrapyd scheuled spider encounter 503 when trying to scrape site?

I am learning about python and scraping and wrote my first spider using Scrapy. It works fine when I run it locally to scrape my test site it works fine. I deployed the project on my remote server in Scrapyd but when I schedule the spider to run…

python scrapy scrapyd

asked Jul 30 '17 at 11:11

Dark Star1

6,986
16
73
121

vote

1 answer

FEED_EXPORT_ENCODING option not working for for Items files in Scrapyd - Python Scrapy

I am scraping Chinese website. I have FEED_EXPORT_ENCODING='utf-8' in settings.py file. If I run my scraper via scrapy crawl myscraper -o output.json Then my output file shows correct Chinese. But if I start my scraper via Scrapyd then the Items…

python json scrapy scrapyd

asked May 14 '17 at 04:38

Umair Ayub

19,358
14
72
146

vote

1 answer

How install Crawlera via setuptools

I want to install crawlera avec setuptools in docker. in my scrapy.cfg file i have: [deploy=test] url = http://localhost:6800/ project = Crawling i test by scrapyd-deploy -l and i have test http://localhost:6800/ in my setup.py i…

python python-2.7 scrapy setuptools scrapyd

asked Mar 24 '17 at 12:05

parik

2,313
12
39
67

vote

0 answers

scrapyd service and periodic scraping in virtualenv

First time I installed scrapyd in Ubuntu 14.04, I didn't use the generic way. Using apt-get, my scrapyd was considered a service that can be started and have (log/config/dbs...) dependencies however the scrapy version was very outdated. So I…

python ubuntu web-scraping scrapy scrapyd

asked Mar 15 '17 at 15:08

user2243952

Prev 1 2 3

…

23 24 Next