Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

votes

0 answers

Spider not Crawling in scrapyd But Works Using scrapy crawl

I was tasked for work to pull a ton of inventory for a company that was acquired by the company I work for. I thought scrapy would be a great tool for this and so far I'm having fun. However, I am trying to utilize scrapyd and am running into a…

python-3.x scrapy scrapyd

asked Dec 17 '22 at 09:45

waltmagic

votes

0 answers

How do I use sudo in scrapyd

Because on centos, I hope that the general user account can use the Packages for root privilege accounts.When I use sudo scrapyd, the message "sudo: scrapyd: command not found" will appear. How can I slove this problem? Thanks a lot for your help

linux centos sudo scrapyd

asked Nov 24 '22 at 04:58

WenTingTseng

votes

1 answer

Is there a way to handle dynamically loaded selectors with clicks on pages in scrapy-playwright?

I have a use case like this. Suppose if I crawl a website abc.com using scrapy playwright the page it loads are of 3 different types of pages like page1->#selector1 page2->#selector2 page3->#selector3 and it changes dynamically there is no…

python scrapy playwright scrapyd playwright-python

asked Nov 09 '22 at 19:08

User10007

votes

0 answers

Cannot deploy spiders with scrapyd deploy (shows 0 spiders)

While attempting to deploy spiders to scrapyd running locally, I am getting the following response: {"node_name": "a-38u3442zr18hl", "status": "ok", "project": "project_name", "version": "1662563564", "spiders": 0} I have 6 spiders which are…

python scrapy scrapyd scrapyd-deploy

asked Sep 08 '22 at 11:19

Tim_B

votes

0 answers

Scrapyd Spiders are going missing every 24 hours

I have a Scraypd server in heroku. It works fine and the spider works and connects to dbs without any issue. I have set it to run everyday by the scheduler in the Scrapydweb UI. However everyday the spider seems to disappear and I would have to…

heroku scrapy scrapyd scrapyd-deploy

asked Aug 29 '22 at 06:46

Suren Gunaseelan

votes

1 answer

scrapyd deployed in KubeSphere，and when running scrapy selenium got exception:'twisted.internet.error.ReactorAlreadyInstalledError'

I deploy scrapyd in KubeSphere, I got expectation when I run scrapy and selenium: 2022-03-16T12:57:15+0000 [Launcher,1832/stderr] return Crawler(spidercls, self.settings, init_reactor=True) File…

selenium scrapy twisted reactor scrapyd

asked Mar 18 '22 at 03:05

何元斌

votes

2 answers

Scrapyd spiders are finished but they are still being shown as Running on WebUI as well as listjobs.json

I have deployed Scrapyd as docker conainter on Google CloudRun. On my local, when I am running container, everything is working fine. But, when I am deploying same container on Google CloudRun, Spider jobs are not removed from Running queue. Though…

google-cloud-run scrapyd scrapyd-deploy

asked Mar 08 '22 at 07:17

Shivkumar Agrawal

votes

1 answer

How to resolve Python Module Not Found Error?

I am facing a problem in importing modules in python. I looked for a solution and found this. but this did not worked either. My Directory is as follows ->MyScrapper --->MyScrapper ----->db_connection.py --->Video_Scrapper ----->…

python scrapy scrapyd

asked Feb 21 '22 at 05:11

Asad Mehmood

votes

1 answer

How to pass custom settings arguments to scrapyd

I am trying to run several Spiders on a Cron based task. I want to use scrapyd arguments to use the same spider with different settings. What I am trying to do is: I try to set the RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408] and a…

python bash curl scrapy scrapyd

asked Feb 17 '22 at 09:34

gerrel93

votes

1 answer

How to bind Heroku port to scrapyd

I created a simple python app on Heroku to launch scrapyd. The scrapyd service starts, but it launches on port 6800. Heroku requires you to bind it the $PORT variable, and I was able to run the heroku app locally. The logs from the process are…

python heroku port twisted scrapyd

asked Jan 24 '22 at 05:59

Matts

1,301
11
30

votes

1 answer

scrapyd-deploy not deploying project? Python Scrapy

I have setup a scrapyd Server. When I run the command scrapyd-deploy, it shows me the following message. but when I navigate to http://10.2.1.54:6800/jobs it shows nothing. Am I Missing something?

python scrapy scrapyd

asked Nov 01 '21 at 04:45

Asad Mehmood

votes

1 answer

Unable to connect to Scrapyd web interface in Docker container from host

I have Django and Scrapyd both running in separate containers, Django works just fine on my host's localhost:8001 but when I try to access Scrapyd via localhost:8000 I get a "connection reset" error from Firefox. I'm trying to connect both contaiers…

python docker scrapyd

asked Oct 07 '21 at 22:32

Denzel Hooke

votes

1 answer

How to callback a Scrapy spider method upon receiving SIGINT or Scrapyd's cancel.json call?

In Scrapy, when we press CTRL+C we see Received SIGINT, shutting down gracefully. Send again to force in logs that comes from this code. Or when we request cancel.json of Scrapyd following code is executed I want to catch those signals in my Spider,…

scrapy scrapyd

asked Jul 06 '21 at 15:31

Umair Ayub

19,358
14
72
146

votes

1 answer

Scrapyd corrupting response?

I'm trying to scrape a specific website. The code I'm using to scrape it is the same as that being used to scrape many other sites successfully. However, the resulting response.body looks completely corrupt (segment…

python scrapy character-encoding web-crawler scrapyd

asked May 07 '21 at 12:25

AimiHat

votes

1 answer

Scrapy Splash HTTP status code is not handled or not allowed

I'm using scrapy-splash to scrape a car dealership website that uses javascript to load the results but I keep getting error 504 Gateway Time-out. I have docker and Win10 and I don't think the problem is docker configuration because I can scrape…

python scrapy scrapy-splash scrapyd

asked Feb 09 '21 at 20:43

user10931428

Prev 1 2 3

…

23 24 Next