Questions tagged [scrapyd]

`Scrapyd` is a daemon for managing `Scrapy` projects. The project used to be part of `scrapy` itself, but was separated out and is now a standalone project. It runs on a machine and allows you to deploy (aka. upload) your projects and control the spiders they contain using a JSON web service.

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders.

355 questions

vote

1 answer

in `escape': undefined method `gsub' for # (NoMethodError)

Hi I am trying to scrape a web page "take the links" go to that links and "to scrape it" too. require 'rubygems' require 'scrapi' require 'uri' Scraper::Base.parser :html_parser web = "http://......" def sub_web(linksubweb) uri =…

ruby-on-rails web-scraping scrapyd

asked Aug 27 '13 at 10:12

Mike Norton

vote

2 answers

generic spider for scrapy project

i am creating generic spider (scrapy spider) for multiple websites. below is my project directory structure. myproject --- __init__.py --- common.py --- scrapy.cfg --- myproject ---__init__.py ---items.py …

python python-2.7 web-crawler scrapy scrapyd

asked Aug 05 '13 at 06:42

AGR

vote

1 answer

MySQL not saving data that's being scraped

I made a small project using Scrapy. The thing is that my scrapy is crawling pages and scraping data. But it is not being saved into my database. I am using MySQL as my database. I guess there is something that I am missing out in my pipelines.py…

python web-scraping scrapy web-crawler scrapyd

asked Jul 29 '13 at 21:03

user2631770

vote

0 answers

scraped items not being saved into database

my scrapy not saving data into database. please suggest. it is scraping data,, but not adding those data into the database.. please look into the codes and sggest something,.. My spider.py file from scrapy.spider import BaseSpider from…

python web-scraping scrapy scrapyd

asked Jul 24 '13 at 19:21

Abhimanyu

vote

1 answer

Scrapy deploy stopped working

I am trying to deploy scrapy project using scrapyd but it is giving me error ... sudo scrapy deploy default -p eScraper Building egg of eScraper-1371463750 'build/scripts-2.7' does not exist -- can't clean it zip_safe flag not set; analyzing archive…

python scrapy scrapyd

asked Jun 17 '13 at 10:15

Vaibhav Jain

5,287
10
54
114

vote

0 answers

Scrapy: having problems in crawling a .aspx page

I'm trying to crawl a .aspx page, but it redirects me to a page which doesn't exist. To solve this, I tried to set 'dont_merge_cookies': True and 'dont_redirect': True, and overwrite my start_requests, but now, it gives me an error "'Response'…

scrapy web-crawler scrapyd

asked Mar 18 '13 at 16:28

user_2000

1,103
3
14
26

vote

0 answers

Running scrapy commands using os.system or subprocess.call

I have a Scrapy project with a web-based interface running on Apache (XAMPP) that allows the user to create, modify and schedule spiders and also includes a call to scrapyd at port 6800 to get the pending/running/finished spiders. It all works…

python apache xampp scrapy scrapyd

asked Jan 14 '13 at 19:11

decibel3276

vote

2 answers

libxml2 or lxml error when trying to run the command "scrapy crawl test"

I have the source code follow as: //Spider class test_crawler(BaseSpider): name = 'test' allowed_domains = ['http://test.com'] start_urls = ['http://test.com/test'] def parse(self, response): hxs =…

python lxml scrapy libxml2 scrapyd

asked Aug 15 '12 at 19:25

Thinh Phan

votes

0 answers

Question regarding beginner Scrapy and scrapy crawl

I recently began to try to learn web scraping using Scrapy. Recently I tried to Scrapycrawl through the books.toscrape.com. According to the terminal, the Scrapycrawl call works fine, but it doesn't return the item count nor does it show any of the…

python scrapy scrapyd

asked Aug 21 '23 at 00:49

Holden Eagle

votes

0 answers

Scrapyd launch failure at an imported droplet, DO

A server image has been exported and imported into a new DO account. I've created a droplet of it with authentication thru SSH keys rather than password authentication, see note. Now it's on, yet, as in the console I launch scrapyd the following…

digital-ocean scrapyd droplet

asked Jun 28 '23 at 10:11

Igor Savinkin

5,669
8
37
69

votes

0 answers

How to resume scrapy crawler on startup through scrapyd?

I am trying to run the scrapy crawler through scrapyd with JOBDIR. I have a script in which I am sending the POST request to scrapyd server: scrapyd_script: import requests import json import logging from datetime import…

python scrapy scrapyd

asked Jun 25 '23 at 23:17

X-somtheing

votes

0 answers

Deploying a spider with a git repo dependency fails eggification

I have a Scrapy project (here after called the_application) that has a dependency on a library (here after called the_library) fetched from a git repository, and everytime I attempt to deploy the Scrapy project by running scrapyd-deploy…

python python-3.x scrapy scrapyd

asked Mar 29 '23 at 19:14

Hrafn

2,867
3
25
44

votes

0 answers

Docker image runs fine on local machine, but fails with "/usr/local/bin/scrapyd -n: Unknown command: scrapyd" when deployed on heroku

This is my docker file: FROM python:3.10 WORKDIR /usr/src/app COPY requirements.txt ./ RUN pip install --no-cache-dir -r requirements.txt COPY CollegeXUniversityDataScraper ./CollegeXUniversityDataScraper/ COPY scrapyd.conf ./ ENTRYPOINT […

python docker heroku scrapy scrapyd

asked Mar 22 '23 at 09:17

Aarsh Patel

votes

0 answers

Scrapyd deploy failing python 3.8

Stats: I start Scrapyd in env: (env) sh-3.2$ scrapyd 2023-01-18T14:44:21+0400 [-] Loading /Users/parikshit.mukherjee/PycharmProjects/nn/ufc-data-crawler/env/lib/python3.8/site-packages/scrapyd/txapp.py... 2023-01-18T14:44:21+0400 [-] Basic…

python-3.x scrapyd scrapyd-deploy

asked Jan 18 '23 at 11:04

Parikshit Mukherjee

votes

1 answer

How to correctly configure CONCURRENT_REQUESTS in a project with multiple spiders

I have a project in Scrapy with ~10 spiders, I run a few of them simultaneously using Scrapyd. However, I have doubts whether my CONCURRENT_REQUESTS setting is correct. Currently my CONCURRENT_REQUESTS is 32, but I have seen that they recommend that…

scrapy scrapyd

asked Dec 21 '22 at 21:49

Jalil SA

Prev 1 2 3

…

23 24 Next