Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
7
votes
1 answer

Horizontally scaling Scrapyd

What tool or set of tools would you use for horizontally scaling scrapyd adding new machines to a scrapyd cluster dynamically and having N instances per machine if required. Is not neccesary for all the instances to share a common job queue, but…
gerosalesc
  • 2,983
  • 3
  • 27
  • 46
7
votes
1 answer

Scrapyd: where do I get to see the output of my crawler once i schedule it using scrapyd

I am new to scrapy and scrapyd. Did some reading and developed my crawler which crawls a news website and gives me all the news articles from it. If I run the crawler simply by scrapy crawl project name -o something.txt It gives me all scraped…
Yogesh D
  • 1,663
  • 2
  • 23
  • 38
7
votes
2 answers

Building an egg of my python project

Can somebody please guide me with step-by-step procedure on how to eggfy my existing python project? The documentation is keep mentioning something about setup.py within a package but I cannot find it in my project... thank you,
Jin-Dominique
  • 3,043
  • 6
  • 19
  • 28
7
votes
4 answers

error in deploying a project using scrapyd

I had multiple spiders in my project folder and want to run all the spiders at once, so i decided to run them using scrapyd service. I have started doing this by seeing here First of all i am in current project folder I had opened the scrapy.cfg…
Shiva Krishna Bavandla
  • 25,548
  • 75
  • 193
  • 313
6
votes
2 answers

I can't access scrapyd port 6800 from browser

I searched a lot on this, it may have a simple solution that I am missing. I have setup scrapy + scrapyd on both my local machine and my server. They work both ok when I try as "scrapyd". I can deploy to local without a problem, and I can access to…
Mehmet Kurtipek
  • 415
  • 6
  • 13
6
votes
1 answer

How to pass parameters to scrapy crawler from scrapyd?

I can run a spider in scrapy with a simple command scrapy crawl custom_spider -a input_val=5 -a input_val2=6 where input_val and input_val2 are the values i'm passing to the spider and the above method works fine.. However while scheduling a spider…
wolfgang
  • 7,281
  • 12
  • 44
  • 72
6
votes
1 answer

Scrapy how to ignore items with blank fields using Loader

I would like to know how to ignore items that don't fill all fields, some kind of droping, because in the output of scrapyd I'm getting pages that don't fill all fields. I have that code: class Product(scrapy.Item): source_url = scrapy.Field( …
Rafael Capucho
  • 461
  • 6
  • 13
5
votes
1 answer

Scrapy spider not working on Django after implementing WebSockets with Channels (cannot call it from an async context)

I'm opening a new question as I'm having an issue with Scrapy and Channels in a Django application and I would appreciate if someone could guide me in the right direction. The reason why I'm using channels is because I want to retrieve in real-time…
Askew
  • 101
  • 11
5
votes
0 answers

How to add a new service to scrapyd from current project

I am trying to run multiple spiders at once and I made my own custom command in scrapy. Now I am trying to run that command through srapyd. I tried to add it as a new service to my scrapd.conf but it throws an error saying there is no such…
5
votes
1 answer

Scrapyd can't find the code in a sub-directory

We have a quite normal Scrapy project, something like that: project/ setup.py scrapy.cfg SOME_DIR_WITH_PYTHON_MODULE/ __init__.py project/ settings.py …
Spaceman
  • 1,185
  • 4
  • 17
  • 31
5
votes
1 answer

How to crawl multiple domain with scrapy

I have a project in which I have to crawl a great number of different sites. All of this sites crawling can use the same spider, as I don't need to extract items from its body pages. The approach I thought is to parametrize the domain to be crawled…
5
votes
2 answers

How do I pass form data with Scrapy from the command line?

How could I pass username and password from the command line? Thanks! class LoginSpider(Spider): name = 'example.com' start_urls = ['http://www.example.com/users/login.php'] def parse(self, response): return…
Theodis Butler
  • 136
  • 2
  • 9
5
votes
1 answer

Scrapyd init error when running scrapy spider

I'm trying to deploy a crawler with four spiders. One of the spiders uses XMLFeedSpider and runs fine from the shell and scrapyd, but the others use BaseSpider and all give this error when run in scrapyd, but run fine from the shell TypeError:…
Cruachan
  • 15,733
  • 5
  • 59
  • 112
5
votes
3 answers

How to store scrapyd items in json format

I am trying to store scrapyd items in a JSON file. Actually by default it store items in a json file but like this: File_1: {item1} {item2} .... And if i run my spider by scrapy crawl spidername -o fileName -t json it will store item like…
user2173955
5
votes
2 answers

Saving items from Scrapyd to Amazon S3 using Feed Exporter

Using Scrapy with amazon S3 is fairly simple, you set: FEED_URI = 's3://MYBUCKET/feeds/%(name)s/%(time)s.jl' FEED_FORMAT = 'jsonlines' AWS_ACCESS_KEY_ID = [access key] AWS_SECRET_ACCESS_KEY = [secret key] and everything works just fine. But…
arikg
  • 402
  • 2
  • 4
  • 17
1
2
3
23 24