Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

1 answer

Horizontally scaling Scrapyd

What tool or set of tools would you use for horizontally scaling scrapyd adding new machines to a scrapyd cluster dynamically and having N instances per machine if required. Is not neccesary for all the instances to share a common job queue, but…

asked Jul 24 '15 at 18:39

gerosalesc

2,983
3
27
46

votes

1 answer

Scrapyd: where do I get to see the output of my crawler once i schedule it using scrapyd

I am new to scrapy and scrapyd. Did some reading and developed my crawler which crawls a news website and gives me all the news articles from it. If I run the crawler simply by scrapy crawl project name -o something.txt It gives me all scraped…

curl scrapy scrapyd

asked Feb 11 '14 at 05:43

Yogesh D

1,663
2
23
38

votes

2 answers

Building an egg of my python project

Can somebody please guide me with step-by-step procedure on how to eggfy my existing python project? The documentation is keep mentioning something about setup.py within a package but I cannot find it in my project... thank you,

python egg scrapyd

asked Dec 09 '13 at 00:30

Jin-Dominique

3,043
6
19
28

votes

4 answers

error in deploying a project using scrapyd

I had multiple spiders in my project folder and want to run all the spiders at once, so i decided to run them using scrapyd service. I have started doing this by seeing here First of all i am in current project folder I had opened the scrapy.cfg…

python scrapy scrapyd

asked Jul 06 '12 at 12:48

Shiva Krishna Bavandla

25,548
75
193
313

votes

2 answers

I can't access scrapyd port 6800 from browser

I searched a lot on this, it may have a simple solution that I am missing. I have setup scrapy + scrapyd on both my local machine and my server. They work both ok when I try as "scrapyd". I can deploy to local without a problem, and I can access to…

python scrapy scrapyd

asked Jul 15 '17 at 19:38

Mehmet Kurtipek

votes

1 answer

How to pass parameters to scrapy crawler from scrapyd?

I can run a spider in scrapy with a simple command scrapy crawl custom_spider -a input_val=5 -a input_val2=6 where input_val and input_val2 are the values i'm passing to the spider and the above method works fine.. However while scheduling a spider…

python scrapy scrapyd

asked Aug 26 '15 at 10:20

wolfgang

7,281
12
44
72

votes

1 answer

Scrapy how to ignore items with blank fields using Loader

I would like to know how to ignore items that don't fill all fields, some kind of droping, because in the output of scrapyd I'm getting pages that don't fill all fields. I have that code: class Product(scrapy.Item): source_url = scrapy.Field( …

python scrapy scrapyd

asked May 22 '14 at 15:07

Rafael Capucho

votes

1 answer

Scrapy spider not working on Django after implementing WebSockets with Channels (cannot call it from an async context)

I'm opening a new question as I'm having an issue with Scrapy and Channels in a Django application and I would appreciate if someone could guide me in the right direction. The reason why I'm using channels is because I want to retrieve in real-time…

django websocket scrapy django-channels scrapyd

asked Sep 01 '20 at 15:37

Askew

votes

0 answers

How to add a new service to scrapyd from current project

I am trying to run multiple spiders at once and I made my own custom command in scrapy. Now I am trying to run that command through srapyd. I tried to add it as a new service to my scrapd.conf but it throws an error saying there is no such…

python web-scraping scrapy scrapyd

asked Nov 19 '18 at 06:37

Soli Technology LLC

votes

1 answer

Scrapyd can't find the code in a sub-directory

We have a quite normal Scrapy project, something like that: project/ setup.py scrapy.cfg SOME_DIR_WITH_PYTHON_MODULE/ __init__.py project/ settings.py …

scrapy scrapyd

asked Jan 31 '15 at 14:41

Spaceman

1,185
4
17
31

votes

1 answer

How to crawl multiple domain with scrapy

I have a project in which I have to crawl a great number of different sites. All of this sites crawling can use the same spider, as I don't need to extract items from its body pages. The approach I thought is to parametrize the domain to be crawled…

scrapyd scrapy

asked Jul 01 '14 at 16:33

Bernardo Botella

votes

2 answers

How do I pass form data with Scrapy from the command line?

How could I pass username and password from the command line? Thanks! class LoginSpider(Spider): name = 'example.com' start_urls = ['http://www.example.com/users/login.php'] def parse(self, response): return…

python screen-scraping scrapy scrapyd

asked Jan 13 '14 at 20:55

Theodis Butler

votes

1 answer

Scrapyd init error when running scrapy spider

I'm trying to deploy a crawler with four spiders. One of the spiders uses XMLFeedSpider and runs fine from the shell and scrapyd, but the others use BaseSpider and all give this error when run in scrapyd, but run fine from the shell TypeError:…

python python-2.7 scrapy scrapyd

asked Jul 31 '13 at 16:08

Cruachan

15,733
5
59
112

votes

3 answers

How to store scrapyd items in json format

I am trying to store scrapyd items in a JSON file. Actually by default it store items in a json file but like this: File_1: {item1} {item2} .... And if i run my spider by scrapy crawl spidername -o fileName -t json it will store item like…

scrapy scrapyd

asked May 28 '13 at 14:44

user2173955

votes

2 answers

Saving items from Scrapyd to Amazon S3 using Feed Exporter

Using Scrapy with amazon S3 is fairly simple, you set: FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl' FEED_FORMAT = 'jsonlines' AWS_ACCESS_KEY_ID = [access key] AWS_SECRET_ACCESS_KEY = [secret key] and everything works just fine. But…

amazon-s3 scrapy scrapyd

asked Apr 11 '13 at 18:03

arikg

Prev 1

…

23 24 Next