Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
3
votes
1 answer

Scrapy managing dynamic spiders

I am building a project where I need a web crawler which crawls a list of different webpages. This list can change at any time. How is this best implemented with scrapy? Should I create one spider for all websites or dynamically create spiders? I…
MaxLudv
  • 33
  • 3
3
votes
1 answer

How to run scrapy server as a daemon

I am trying to run scrapy server as a daemon. I have tried this link but i getting error whenever i run this command: sudo apt-get install scrapyd-0.14 Reading package lists... Done Building dependency tree Reading state information...…
Vaibhav Jain
  • 5,287
  • 10
  • 54
  • 114
3
votes
2 answers

Scrapyd: How to specify libs and common folders that deployed projects can use?

Scrapyd is service where we can eggify deploy our projects. However I am facing a problem. I have a Project named MyScrapers whose spider classes uses an import statement as follows: from mylibs.common.my_base_spider import MyBaseSpider The path to…
3
votes
3 answers

scrapy log issue

i have multiple spiders in one project , problem is right now i am defining LOG_FILE in SETTINGS like LOG_FILE = "scrapy_%s.log" % datetime.now() what i want is scrapy_SPIDERNAME_DATETIME but i am unable to provide spidername in log_file name .. i…
akhter wahab
  • 4,045
  • 1
  • 25
  • 47
3
votes
1 answer

Running multiple spiders using scrapyd

I had multiple spiders in my project so decided to run them by uploading to scrapyd server. I had uploaded my project succesfully and i can see all the spiders when i run the command curl…
Shiva Krishna Bavandla
  • 25,548
  • 75
  • 193
  • 313
2
votes
2 answers

Change scrapy settings via api

I use scrapy and scrapyd and send some custom settings via api (with Postman software). Photo of the request: For example, I send the value of start_urls through api and it works correctly. Now the problem is that I cannot apply the settings that I…
Sardar
  • 524
  • 1
  • 6
  • 19
2
votes
3 answers

scrapyd error while deploying with scrapyd-client

While executing the command: scrapyd-deploy default I'm runnning into an error saying: File"/home/user/miniconda3/envs/quickcompany/lib/python3.8/site-packages/scrapyd_client/deploy.py", line 23, in from scrapy.utils.http import…
2
votes
2 answers

How to distribute spiders across the cluster using Scrapyd and ScrapydWeb?

I am working in a crawling project, using Scrapy, and I need to distribute my spiders across different nodes in a cluster to make the process faster. I am using ScrapydWeb to manage it and I have already configured two machines, one of them with…
2
votes
3 answers

Running Scrapyd as a daemon on centos 6.10 python 3.6

I am trying to run my scrapers on my dedicated centos 6.10 server. I got python 3.6.6 installed, created a venv, and installed a ran scrapyd from a pip install. The command scrapyd shows this: 2018-10-24T12:23:56-0700 [-] Loading…
Pixelknight1398
  • 537
  • 2
  • 10
  • 33
2
votes
1 answer

Using scrapy with scrapyd in Django not entering def(parse)

i'm still learning scrapy and i am trying to use scrapy with scrapyd inside a Django Project. But i am noticing that the spider just wont enter the def(parse) import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import…
Vira Xeva
  • 163
  • 1
  • 9
2
votes
1 answer

Scrapy: Running multiple spider at scrapyd - python logical error

Scrapy 1.4 I am using this script (Run multiple scrapy spiders at once using scrapyd) to schedule multiple spiders at Scrapyd. Before I was using Scrapy 0.19 and was running fine. I am receiving the error: TypeError: create_crawler() takes exactly…
Ailton
  • 165
  • 1
  • 12
2
votes
0 answers

How many request for scraping can handle ScrapyRT

I have a single project with the web interface where I should download from 3000 to 20000 urls per week (month). I use tickets system for showing a progress: what is downloaded, what is pending for downloading, which urls have timeout errors and…
amarynets
  • 1,765
  • 10
  • 27
2
votes
1 answer

How can I run Scrapyd on a server

As of recently Scrapinghub no longer has periodic jobs in their free package, which is what I used to use to run my Scrapy crawlers. Therefore, I decided to use Scrapyd instead. So I went ahead and got a virtual server running Ubuntu 16.04. (This is…
Sebastian
  • 831
  • 2
  • 13
  • 36
2
votes
1 answer

spider = cls(*args, **kwargs) TypeError: __init__() got an unexpected keyword argument '_job'

Trying to use scrapyd for scraping using selenium and webdriver the spider is working fine with command "scrapy crawl myspider" but when i deploy using scrapyd and finally schedule it using curl and scrapyd api it trigger the unexpected keyword…
Abdul Qadir R.
  • 1,059
  • 3
  • 9
  • 26
2
votes
0 answers

scrapyd deploy job fails with unable to open database file

I am running a batch of 500 crawl jobs on scrapyd fired from a shell script. I am having this issue locally on mac as well as on ec2 instance. These crawl jobs have been working fine with a batch of 100 however when i run it for…
vishal
  • 21
  • 2