Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
0
votes
2 answers

Can't deploy scrapy to scrapyd server

I'm trying to deploy spider created via Portia. Portia, scrapyd - all is latest versions. I'm running scrapyd server just by command: scrapyd I'm getting this result on my machine locally: $> cd PROJECT_PATH_HERE; $> scrapyd-deploy Packing version…
artyomboyko
  • 2,781
  • 5
  • 40
  • 54
0
votes
1 answer

scrapyd or CrawlerProcess for parralel parsing

I need to run a lot of spiders (~20-50) in parallel on the same server. Some of my spiders are in progress more than two days and sometimes I need to run a new one before all processes are finished. As I understand, such possibility provides scrapyd…
Rainmaker
  • 10,294
  • 9
  • 54
  • 89
0
votes
1 answer

How to perform multiple request parallely to a page with different request parameter value?

I have to scrap a page in a site to which I have to post a parameter, but I have an array of value to request the same page. I don't want to scrap the page for each value of the array sequentially . I want to scrap it parallel(that means search…
Manikandan Arunachalam
  • 1,470
  • 3
  • 17
  • 32
0
votes
1 answer

Relative path in scrapyd

import scrapy import csv from series.items import SeriesItem class EpisodeScraperSpider(scrapy.Spider): name = "episode_scraper" allowed_domains = ["imdb.com"] start_urls = [] def __init__(self, id=None, series=None, *args,…
April
  • 1
  • 1
0
votes
1 answer

scrapyd_api.exceptions.ScrapydResponseError: disk I/O error

when I schedule my spider then scrapyd_api.exceptions.ScrapydResponseError: disk I/O error gives error. why did such kind of problem?
Pythonsguru
  • 424
  • 3
  • 11
0
votes
1 answer

Invoke scrapy script from PHP/HTML or webserver

I have a scrapy script which will run multiple spiders. I want to crawl the single spider or run the multiple spider script from PHP/HTML code. Is that possible? If so, how?
Sabeena
  • 85
  • 12
0
votes
0 answers

scrapy/scrapyd can't process more than ~200 items per minute

I have a working scrapy spider deployed on an Amazon EC2 instance (c4xlarge) and running using scrapyd. No matter what I do, I can't seem to top ~200 processed items per minute (according to scrapy logs). I tried playing around with scrapyd…
0
votes
1 answer

Access django models in scrapy with split settings

I am trying to use django app models within scrapy but keep getting the error "No module named myapp.apps.myscraping.myscraping.items" when trying to deploy the scrapy code to scrapyd using scrapyd-deploy. The problem I have that I have split…
0
votes
1 answer

How do scrapyd deploy project to remote?

scrapyd can help deploy project. It seems like it create egg and upload it?? But where did it upload to ?? I mean: when I use scrapyd-deploy -p deploy to http://1.2.3.4:6800 Can I find it on http://1.2.3.4:6800 And the…
user2492364
  • 6,543
  • 22
  • 77
  • 147
0
votes
0 answers

How fields are storing in list in Portia crawl?

EDIT: I am seeing that while running Portia spider the extracted fields are storing in a python variable list[] and returning the values while logging the extracted details in scrapyd. I just want to know that how the fields are being extracted and…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
0
votes
1 answer

How do i get request and response count in scrapyd?

I am trying to get request and response count in scrapyd,while running multiple spider means 8 spider dynamically.I am try to get those count using python. following counts: enter code here{'downloader/request_bytes': 130427, …
Karthick
  • 55
  • 8
0
votes
1 answer

Python EGG: include top-level dir

We have something similar to this: /dir/ subdir1/ subdir1-1/ (some module) setup.py subdir1-2/ (another module) subdir2/ (another module) The code…
Spaceman
  • 1,185
  • 4
  • 17
  • 31
0
votes
1 answer

How scrapy write in log while running spider?

While running scrapy spider, I am seeing that the log message has "DEBUG:" which has 1. DEBUG: Crawled (200) (GET http://www.example.com) (referer: None) 2. DEBUG: Scraped from (200 http://www.example.com) I want to know that 1. what to those…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
0
votes
1 answer

Mock in production code

I'll try to be direct here: I'm a Ruby programmer, beginning with Python, and I'm trying something with Scrapy. I'm looking into a code my company received from a third-party, and one thing is breaking my Scrapyd deploy (on the cloud): ImportError:…
0
votes
1 answer

Installed Scrapyd on rackspace and want to access on local machine

I have installed Scrapyd Service on Rack Space. I have dedicated IP. I can access scrapyd on server only [http://localhost:6800]. I want to access this from my local computer and want to deploy spiders later on. I need help to access it like…
yasirnazir
  • 1,133
  • 7
  • 12