Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
0
votes
1 answer

Sharing visited urls between multiple spiders in scrapy?

I am using scrapyd to run multiple spiders as jobs across the same domain. I assumed scrapy had a hashtable of visited urls that it shared and co-ordinated with other spiders when it crawled. When I create instances of the same spider by curl…
Sai
  • 113
  • 2
  • 10
0
votes
1 answer

Error when deploying scrapy project on the scrapy cloud

I am using scrapy 0.20 on Python 2.7. I want to deploy my scrapy project on scrapy cloud I developed my scrapy project with simple spider. navigate to my scrapy project folder. typed scrapy deploy scrapyd -d koooraspider on cmd. Where koooraspider…
William Kinaan
  • 28,059
  • 20
  • 85
  • 118
0
votes
1 answer

Running more than one spiders one by one

I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrape web pages and save them to database. I have one spider per webpage. But I am having trouble to run those spiders at once such that a spider…
Nabin
  • 11,216
  • 8
  • 63
  • 98
0
votes
2 answers

Issues with the installation of scrapyd on Windows

I am having issues with the installation of scrapyd on Wndows 7 I have installed the package using easy_install, but still the command scrapyd comes up with nothing. Here is the output of my install: C:\Python27\Lib\site-packages\scrapy>easy_install…
eboni
  • 883
  • 2
  • 10
  • 25
0
votes
1 answer

Scrapyd pass parameters when deploying

This is a simple example of a scrapy.cfg file: [settings] default = crawly.settings [deploy:s1] url = http://localhost:6800 project = my_project I wanna know if I could pass any parameters to my scrapyd instance using this file. What I wanna do is…
AliBZ
  • 4,039
  • 12
  • 45
  • 67
0
votes
1 answer

Log for scrapyd installed with pip

I installed scrapyd with pip, and I don't have a '/var/log/scrapyd' dir. I'm trying to find out what's happening to my http call since I get and 'OK' estatus when I initiate it, but no log is generated in 'logs/project/spider/' (and according to…
Jean Ventura
  • 27
  • 10
0
votes
0 answers

scrapyd: how to override spider name using cmd arguments

I am using scrapyd (project deployed on ec2 instance of AWS) that accept seed url to start, I want to start each time run spider with different name, so that I can manage items and logs easily on ec2 instance. locally I can do like this crawl…
Tasawer Nawaz
  • 927
  • 8
  • 19
0
votes
2 answers

Scrapy recursively scraping craigslist

I am using scrapy to scrap craigslist and get all links, go to that link, store the description for each page and email for reply. Now I have written a scrapy script which gors through the craigslist/sof.com and gets all job titles and urls. I want…
Scooby
  • 3,371
  • 8
  • 44
  • 84
0
votes
1 answer

getting spider instance from scrapyd

Is there a way to get the instance of the spider that runs when you schedule a run using scrapyd? I need to access attributes in the spider to handle outside the run and can't use a json/csv file to do this.
Jean Ventura
  • 27
  • 10
0
votes
1 answer

How does scrapyd determine the 'latest' version of a project?

According to the documentations, when deploying a project to scrapyd, I can use the git commit hash as the version, by doing this: $ scrapyd-deploy default -p myproject --version GIT The documentation also says that scrapyd can keep multiple…
Kal
  • 1,707
  • 15
  • 29
0
votes
1 answer

How do I call spiders from different projects with different pipelines from a python script?

I have a three different spiders in different scrapy projects called REsale, REbuy and RErent, each with their own pipeline that directs their output to various MySQL tables on my server. They all run OK when called using scrapy crawl. Ultimately,…
Mark
  • 195
  • 1
  • 18
0
votes
1 answer

Scrapyd Post schedule.json from asp.net

I have scrapyd and spider installed on a Unix machine, and everything works fine when I run curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider I can see the job status,logs and items on web interface of scrapyd…
Syed Waqas
  • 862
  • 2
  • 9
  • 29
0
votes
1 answer

How to install the latest Scrapyd package?

I notice that the latest stable version of scrapy was released last week(2013-08-09). After updating scrapy to version 0.18, the previous installed scrapyd-0.17 was uninstalled by apt-get(Ubuntu 12.04) automatically. Is there a scrapyd-0.18? How to…
kev
  • 155,172
  • 47
  • 273
  • 272
0
votes
1 answer

How to install scrapyd on freeBSD

I am trying to install scrapyd on freeBSD but, I am getting this error: $ cd /usr/ports/www/py-scrapyd/ && sudo make install clean -bash: cd: /usr/ports/www/py-scrapyd/: No such file or directory I have installed scrapy using this command : $ cd…
Vaibhav Jain
  • 5,287
  • 10
  • 54
  • 114
0
votes
1 answer

Run Scrapy on IIS

I have an IIS server and on it I have an ASP.NET MVC application. The MVC application will revolve around Scraped data. Is there a way I can run Scrapy (a tool built in Python) on IIS? Simliar to how we can run PHP and WordPress on IIS.
J86
  • 14,345
  • 47
  • 130
  • 228
1 2 3
23
24