Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions
0
votes
0 answers

Spider not Crawling in scrapyd But Works Using scrapy crawl

I was tasked for work to pull a ton of inventory for a company that was acquired by the company I work for. I thought scrapy would be a great tool for this and so far I'm having fun. However, I am trying to utilize scrapyd and am running into a…
waltmagic
  • 631
  • 2
  • 9
  • 22
0
votes
0 answers

How do I use sudo in scrapyd

Because on centos, I hope that the general user account can use the Packages for root privilege accounts.When I use sudo scrapyd, the message "sudo: scrapyd: command not found" will appear. How can I slove this problem? Thanks a lot for your help
WenTingTseng
  • 51
  • 1
  • 1
  • 2
0
votes
1 answer

Is there a way to handle dynamically loaded selectors with clicks on pages in scrapy-playwright?

I have a use case like this. Suppose if I crawl a website abc.com using scrapy playwright the page it loads are of 3 different types of pages like page1->#selector1 page2->#selector2 page3->#selector3 and it changes dynamically there is no…
0
votes
0 answers

Cannot deploy spiders with scrapyd deploy (shows 0 spiders)

While attempting to deploy spiders to scrapyd running locally, I am getting the following response: {"node_name": "a-38u3442zr18hl", "status": "ok", "project": "project_name", "version": "1662563564", "spiders": 0} I have 6 spiders which are…
Tim_B
  • 129
  • 1
  • 1
  • 10
0
votes
0 answers

Scrapyd Spiders are going missing every 24 hours

I have a Scraypd server in heroku. It works fine and the spider works and connects to dbs without any issue. I have set it to run everyday by the scheduler in the Scrapydweb UI. However everyday the spider seems to disappear and I would have to…
0
votes
1 answer

scrapyd deployed in KubeSphere,and when running scrapy selenium got exception:'twisted.internet.error.ReactorAlreadyInstalledError'

I deploy scrapyd in KubeSphere, I got expectation when I run scrapy and selenium: 2022-03-16T12:57:15+0000 [Launcher,1832/stderr] return Crawler(spidercls, self.settings, init_reactor=True) File…
何元斌
  • 1
  • 1
0
votes
2 answers

Scrapyd spiders are finished but they are still being shown as Running on WebUI as well as listjobs.json

I have deployed Scrapyd as docker conainter on Google CloudRun. On my local, when I am running container, everything is working fine. But, when I am deploying same container on Google CloudRun, Spider jobs are not removed from Running queue. Though…
0
votes
1 answer

How to resolve Python Module Not Found Error?

I am facing a problem in importing modules in python. I looked for a solution and found this. but this did not worked either. My Directory is as follows ->MyScrapper --->MyScrapper ----->db_connection.py --->Video_Scrapper ----->…
Asad Mehmood
  • 292
  • 2
  • 10
  • 20
0
votes
1 answer

How to pass custom settings arguments to scrapyd

I am trying to run several Spiders on a Cron based task. I want to use scrapyd arguments to use the same spider with different settings. What I am trying to do is: I try to set the RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408] and a…
gerrel93
  • 89
  • 6
0
votes
1 answer

How to bind Heroku port to scrapyd

I created a simple python app on Heroku to launch scrapyd. The scrapyd service starts, but it launches on port 6800. Heroku requires you to bind it the $PORT variable, and I was able to run the heroku app locally. The logs from the process are…
Matts
  • 1,301
  • 11
  • 30
0
votes
1 answer

scrapyd-deploy not deploying project? Python Scrapy

I have setup a scrapyd Server. When I run the command scrapyd-deploy, it shows me the following message. but when I navigate to http://10.2.1.54:6800/jobs it shows nothing. Am I Missing something?
Asad Mehmood
  • 292
  • 2
  • 10
  • 20
0
votes
1 answer

Unable to connect to Scrapyd web interface in Docker container from host

I have Django and Scrapyd both running in separate containers, Django works just fine on my host's localhost:8001 but when I try to access Scrapyd via localhost:8000 I get a "connection reset" error from Firefox. I'm trying to connect both contaiers…
0
votes
1 answer

How to callback a Scrapy spider method upon receiving SIGINT or Scrapyd's cancel.json call?

In Scrapy, when we press CTRL+C we see Received SIGINT, shutting down gracefully. Send again to force in logs that comes from this code. Or when we request cancel.json of Scrapyd following code is executed I want to catch those signals in my Spider,…
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
0
votes
1 answer

Scrapyd corrupting response?

I'm trying to scrape a specific website. The code I'm using to scrape it is the same as that being used to scrape many other sites successfully. However, the resulting response.body looks completely corrupt (segment…
AimiHat
  • 383
  • 4
  • 14
0
votes
1 answer

Scrapy Splash HTTP status code is not handled or not allowed

I'm using scrapy-splash to scrape a car dealership website that uses javascript to load the results but I keep getting error 504 Gateway Time-out. I have docker and Win10 and I don't think the problem is docker configuration because I can scrape…
user10931428