Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
2
votes
3 answers

How to password protect Scrapyd UI?

I have my website available to public and there is Scrapyd running at port 6800 like http://website.com:6800/ I do not want anyone to see list of my crawlers. I know anyone can easily guess type up port 6800 and can see whats going on. I have few…
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
2
votes
1 answer

Scrapyd links do not work with HTTPS, just keeps loading and loading

I have scrapyd installed in Ubuntu. I also have a website with SSL enabled, I need to make request to links like https://IP_HERE:6800/listjobs.json?project=default inside my website. But it looks like Scrapyd does not work with HTTPS. Even if I open…
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
2
votes
1 answer

error deploying scrapyd project

when trying to execute this command: scrapyd-deploy test -p project=myProject I get the following error: Traceback (most recent call last): File "/usr/bin/scrapyd-deploy", line 269, in main() File…
Slug
  • 319
  • 1
  • 2
  • 12
2
votes
0 answers

Scrapyd permissions 'nogroup' error

I've been struggling with the Scrapyd installation for some time, and even though in the end I seem to have managed, I still have some problems. I am getting an error when processing of items. File…
AimiHat
  • 383
  • 4
  • 14
2
votes
2 answers

Start Scrapyd as service

I would like to start scrapyd as service but when I start scrapyd, if I close the SSH session the service scrapyd close automatically. When I try to start as service like this I have an error : root@vps:~# service scrapyd start scrapyd: Failed to…
Pixel
  • 900
  • 1
  • 13
  • 31
2
votes
2 answers

Use Scrapyd with arguments

I'm using Scrapyd to use scrapy as webservice. I would like to use the curl command with parameters like this : curl http://myip:6800/schedule.json -d project=default -d spider=myspider -d domain=www.google.fr But I don't know how to get the…
Pixel
  • 900
  • 1
  • 13
  • 31
2
votes
1 answer

Where does scrapyd write crawl results when using an S3 FEED_URI, before uploading to S3?

I'm running a long-running web crawl using scrapyd and scrapy 1.0.3 on an Amazon EC2 instance. I'm exporting jsonlines files to S3 using these parameters in my spider/settings.py file: FEED_FORMAT: jsonlines FEED_URI: s3://my-bucket-name My…
bds914
  • 23
  • 2
2
votes
1 answer

Portia Spider logs showing ['Partial'] during crawling

I have created a spider using Portia web scraper and the start URL is https://www1.apply2jobs.com/EdwardJonesCareers/ProfExt/index.cfm?fuseaction=mExternal.searchJobs While scheduling this spider in scrapyd I am getting DEBUG: Crawled (200)
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
2
votes
1 answer

scrapyd deploy error : No module named project.models"

My scrapy crawl command works well But when I want to deploy scrapyd ,I met problems scrapyd-deploy -p I try on my mac and a remote server(centos),but both had error Deploying to project "start" in…
user2492364
  • 6,543
  • 22
  • 77
  • 147
2
votes
2 answers

How to quit selenium driver when spider is closed

I have a spider where I have to use Selenium to scrape dynamic data on page. Here's what it looks like: class MySpider( name = 'myspider' start_urls = ['http://example.org'] def __init__(self, *args, **kwargs): super(,…
Dmitrii Mikhailov
  • 5,053
  • 7
  • 43
  • 69
2
votes
0 answers

HOw to get dummy scrapy stuts count in scrapyd

How do i get the the "DummyStatsCollector" in scrapyd. I have studied from this link "http://doc.scrapy.org/en/latest/topics/stats.html#dummystatscollector".. but there is no brief explanation about get scraped status in scrapyd. I would like to be…
Karthick
  • 55
  • 8
2
votes
1 answer

Fatal Error when launching scrapyd in terminal

I just installed scrapyd on Ubuntu 14.04 and after the installation, I simply typed "scrapyd" into the command line and got: Failed to load application: No module named txweb I used both the general and Ubuntu specific installations and neither…
loremIpsum1771
  • 2,497
  • 5
  • 40
  • 87
2
votes
0 answers

How do I write a form equivalent to this curl to send requests to my scrapyd instance?

curl http://localhost:6800/addversion.json -F project=firstproject -F version=r2 -F egg=@/tmp/scrapydeploy-ffa46x/project-1.0-py2.7.egg Response: {"status": "ok", "project": "firstproject", "version": "r2", "spiders": 2} I have tried
michaelAdam
  • 1,119
  • 14
  • 30
2
votes
0 answers

Transfer data from one rabbitmq consumer to other consumer

I'm presently working on my ubuntu(14.04) system. In my project there are three servers apart from the broker. The core is the flask server. The other two are Scrapyd and Sentiment Analysis server. Using the tutorial 'Work Queues', I have managed…
Vivek Anand
  • 621
  • 1
  • 7
  • 15
2
votes
1 answer

Scrapy encountered http status <521>

I am new to scrpay, and tried to crawl a website page but was returned http status code <521> Is it mean the server refuse to be connected? ( i can open it by browser) I tried to use cookie setting, but still returned with 521. Question: what's…
Alpha Shen
  • 21
  • 2