Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
1
vote
0 answers

Usage of scrapyd instead of scrapy

Using scrapyd I can run scrapy on multiple cores. The following call i do using scrapy: scrapy crawl buch According to the information there is no multiprocessor usage: Scrapy does not use multithreading and will not use more than one core. If…
Ploetzeneder
  • 1,281
  • 4
  • 20
  • 34
1
vote
1 answer

Unable to telnet into scrapy

I have a crawler running since a few days. I want to pause the crawler in order to do something else on the system. Scrapy documentation says, this can be done using telnet console but I am unable to login into telnet console. Here are the processes…
Mayank Jaiswal
  • 12,338
  • 7
  • 39
  • 41
1
vote
0 answers

How to send data from scrapy to flask server?

I have a flask sever running on localhost:5000 and a scrapyd running on localhost:6800 with only one spider. My problem is that i have scraped some data and i want to send the data to the flask server for some processing over the extracted data. If…
Vivek Anand
  • 621
  • 1
  • 7
  • 15
1
vote
1 answer

Scrapyd error - AttributeError: 'NoneType' object has no attribute 'module_name'

This is my setup.py generated by running scrapyd-deploy My project is named crawler and exists in the crawler directory # Automatically created by: scrapyd-deploy from setuptools import setup, find_packages setup( name = 'project', …
wolfgang
  • 7,281
  • 12
  • 44
  • 72
1
vote
0 answers

scrapyd exceptions.OSError: [Errno 20] Not a directory

I'm using django-dynamic-scrapy with scrapyd, and confused with this error, (in the last line, ScrapyDemo is my scrapy project name) somebody help me please: 2015-08-16 20:18:31+0800 [Launcher,12972/stderr] Unhandled error in Deferred: 2015-08-16…
leon
  • 75
  • 1
  • 8
1
vote
1 answer

Import functions in a different file while using scrapyd

I am building a crawler using scrapy and and run it on a remote machine running scrapyd. My spider is quite long and complex and so I put some functions in a different file def.py. My project looks like this ./ |-- my_project | |-- __init__.py | …
bwayne
  • 13
  • 4
1
vote
0 answers

How to get URL from Crawled instead of Scraped from in Portia spider deployment?

I am deploying a Portia spider in scrapyd. While deploying I am passing URLs for every link parsing Example: The URL(say URL_1) crawled by the spider is http://www.example.com/query1 and the URL(say URL_2) I am passing is…
Prabhakar
  • 1,138
  • 2
  • 14
  • 30
1
vote
1 answer

unable to deploy portia project using scrapyd-deploy due to 'No module found ..'

I am evaluating portia and run in to an issue deploying to scrapyd. When I try to deploy my portia project using scrapyd-deploy local -p new_project from my portia project directory I get the following error message Packing version…
Rig
  • 11
  • 2
1
vote
0 answers

Scrapyd error when trying to schedule a job

When I try yo schedule a job after I have deployed a project I get the following error: Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render return JsonResource.render(self,…
jsjc
  • 1,003
  • 2
  • 12
  • 24
1
vote
0 answers

scrapyd jobs are not running and some scrapyd jobs are not finishing

I have deployed some jobs on a couple of machines with crontab. Recently, I have two problems. 1.Some problems happened on one of these machines:jobs are not running and staying in pending queue for a lot of days. 2.Unlike problem above, some jobs…
zhang wei
  • 11
  • 3
1
vote
0 answers

scrapyd has error : exceptions.TypeError: is not JSON serializable

I use scrapyd to run scrapy jobs,there is error : I don't know why would this happen?? Please help me. ERROR: Error caught on signal handler:
user2492364
  • 6,543
  • 22
  • 77
  • 147
1
vote
1 answer

Difference between Celery and Scrapyd

I have built a small scrapy spider using Portia. I have deployed it in the Scrapyd and working fine. After searching i found that we can use Celery to schedule the spider. Actually what is the difference between Scrapyd and Celery? Can any one…
backtrack
  • 7,996
  • 5
  • 52
  • 99
1
vote
1 answer

Request working in CURL but not in Ajax

I have a Scrapyd server running and trying to schedule a job. When i try below using CURL it is working fin e curl http://XXXXX:6800/schedule.json -d project=stackoverflow -d spider=careers.stackoverflow.com -d setting=DOWNLOAD_DELAY=2 -d…
backtrack
  • 7,996
  • 5
  • 52
  • 99
1
vote
2 answers

Scrapyd deploy project on a server with dynamic ip

I want to deploy my scrapy project on a ip that is not listed in the scrapy.cfg file , because the ip can change and i want to automate the process of deploying. i tried giving the ip of the server directly in the deploy command but it did not work.…
Omair Shamshir
  • 2,126
  • 13
  • 23
1
vote
0 answers

Scrapy spider stats webservice JSON format

How would one go about implementing a live feedback on what the spider is doing ? I have the logs that tell me each second pages per second and items, but i would like to access that via a webservice. Maybe I'm wrong, but I found in the scrapy docs…
Mihai
  • 133
  • 1
  • 14