Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

2 answers

Can't deploy scrapy to scrapyd server

I'm trying to deploy spider created via Portia. Portia, scrapyd - all is latest versions. I'm running scrapyd server just by command: scrapyd I'm getting this result on my machine locally: $> cd PROJECT_PATH_HERE; $> scrapyd-deploy Packing version…

scrapy scrapyd

asked May 16 '16 at 17:42

artyomboyko

2,781
5
40
54

votes

1 answer

scrapyd or CrawlerProcess for parralel parsing

I need to run a lot of spiders (~20-50) in parallel on the same server. Some of my spiders are in progress more than two days and sometimes I need to run a new one before all processes are finished. As I understand, such possibility provides scrapyd…

scrapy scrapyd

asked May 13 '16 at 13:07

Rainmaker

10,294
9
54
89

votes

1 answer

How to perform multiple request parallely to a page with different request parameter value?

I have to scrap a page in a site to which I have to post a parameter, but I have an array of value to request the same page. I don't want to scrap the page for each value of the array sequentially . I want to scrap it parallel(that means search…

python cookies scrapy scrapyd

asked Apr 06 '16 at 06:08

Manikandan Arunachalam

1,470
3
17
32

votes

1 answer

Relative path in scrapyd

import scrapy import csv from series.items import SeriesItem class EpisodeScraperSpider(scrapy.Spider): name = "episode_scraper" allowed_domains = ["imdb.com"] start_urls = [] def __init__(self, id=None, series=None, *args,…

scrapy screen-scraping scrapyd

asked Mar 18 '16 at 09:00

April

votes

1 answer

scrapyd_api.exceptions.ScrapydResponseError: disk I/O error

when I schedule my spider then scrapyd_api.exceptions.ScrapydResponseError: disk I/O error gives error. why did such kind of problem?

api scrapyd

asked Feb 22 '16 at 10:35

Pythonsguru

votes

1 answer

Invoke scrapy script from PHP/HTML or webserver

I have a scrapy script which will run multiple spiders. I want to crawl the single spider or run the multiple spider script from PHP/HTML code. Is that possible? If so, how?

php python web-scraping scrapy scrapyd

asked Nov 22 '15 at 23:37

Sabeena

votes

0 answers

scrapy/scrapyd can't process more than ~200 items per minute

I have a working scrapy spider deployed on an Amazon EC2 instance (c4xlarge) and running using scrapyd. No matter what I do, I can't seem to top ~200 processed items per minute (according to scrapy logs). I tried playing around with scrapyd…

amazon-ec2 scrapy scrapyd

asked Nov 08 '15 at 16:08

Daniel Dubovski

votes

1 answer

Access django models in scrapy with split settings

I am trying to use django app models within scrapy but keep getting the error "No module named myapp.apps.myscraping.myscraping.items" when trying to deploy the scrapy code to scrapyd using scrapyd-deploy. The problem I have that I have split…

python django scrapy scrapyd

asked Oct 13 '15 at 20:27

Arjun Marwaha

votes

1 answer

How do scrapyd deploy project to remote?

scrapyd can help deploy project. It seems like it create egg and upload it?? But where did it upload to ?? I mean: when I use scrapyd-deploy -p deploy to http://1.2.3.4:6800 Can I find it on http://1.2.3.4:6800 And the…

scrapy scrapyd

asked Sep 22 '15 at 15:57

user2492364

6,543
22
77
147

votes

0 answers

How fields are storing in list in Portia crawl?

EDIT: I am seeing that while running Portia spider the extracted fields are storing in a python variable list[] and returning the values while logging the extracted details in scrapyd. I just want to know that how the fields are being extracted and…

python scrapyd portia scrapinghub

asked Aug 07 '15 at 11:45

Prabhakar

1,138
2
14
30

votes

1 answer

How do i get request and response count in scrapyd?

I am trying to get request and response count in scrapyd,while running multiple spider means 8 spider dynamically.I am try to get those count using python. following counts: enter code here{'downloader/request_bytes': 130427, …

python-2.7 scrapy scrapyd portia

asked Jul 22 '15 at 10:27

Karthick

votes

1 answer

Python EGG: include top-level dir

We have something similar to this: /dir/ subdir1/ subdir1-1/ (some module) setup.py subdir1-2/ (another module) subdir2/ (another module) The code…

python egg scrapyd

asked Jun 11 '15 at 17:25

Spaceman

1,185
4
17
31

votes

1 answer

How scrapy write in log while running spider?

While running scrapy spider, I am seeing that the log message has "DEBUG:" which has 1. DEBUG: Crawled (200) (GET http://www.example.com) (referer: None) 2. DEBUG: Scraped from (200 http://www.example.com) I want to know that 1. what to those…

python scrapy scrapyd portia

asked Jun 11 '15 at 10:42

Prabhakar

1,138
2
14
30

votes

1 answer

Mock in production code

I'll try to be direct here: I'm a Ruby programmer, beginning with Python, and I'm trying something with Scrapy. I'm looking into a code my company received from a third-party, and one thing is breaking my Scrapyd deploy (on the cloud): ImportError:…

python scrapy scrapyd

asked Apr 08 '15 at 11:54

Thyago B. Rodrigues

votes

1 answer

Installed Scrapyd on rackspace and want to access on local machine

I have installed Scrapyd Service on Rack Space. I have dedicated IP. I can access scrapyd on server only [http://localhost:6800]. I want to access this from my local computer and want to deploy spiders later on. I need help to access it like…

scrapy scrapyd

asked Mar 15 '15 at 21:48

yasirnazir

1,133
7
12

Prev 1 2 3

…

23 24 Next