Highest Voted 'scrapy-pipeline' Questions

1

vote

2 answers

Scrapy Pipeline to Parse

I made a pipeline to put scrapy data to my Parse Backend PARSE = 'api.parse.com' PORT = 443 However, I can't find the right way to post the data in Parse. Because everytime it creates undefined objects in my Parse DB. class…

asked Mar 16 '16 at 19:24

Thomas Simonini

93
12

1

vote

1 answer

Scrapy spider that get two pictures at same page then names them differently

I'm new both to Python and Scrapy so I'm not sure I've chosen the best method for doing this; but my aim is to get two (or more) different pictures at a page and naming the pictures differently. How should I set up the pipeline, should I do a…

python scrapy scrapy-pipeline

asked Mar 08 '16 at 21:39

brrrglund

51
1
8

1

vote

0 answers

Scrapy - Invoke a new crawling process when a crawler finishes

I search for urls - xxx.com/a, xxx.com/b etc as found from two start_urls xxx.com/LISTA and xxx/com/LISTB Once this crawler has finished I want to also additionally crawl pages xxx.com/x_in_database and xxx.com/y_in_database - whose URLs were…

python web-crawler scrapy scrapy-pipeline

asked Feb 24 '16 at 00:59

dowjones123

3,695
5
40
83

1

vote

1 answer

Pipeline for item not JSON serializable

I am trying to write output of a scraped xml to json. The scrape fails due to an item not being serializable. From this question its advised that you need to build a pipeline, answer not provided out of scope for question SO scrapy serializer So…

python json serialization scrapy scrapy-pipeline

asked Jan 24 '16 at 04:51

sayth

6,696
12
58
100

1

vote

1 answer

Cannot download images from website with scrapy

I'm starting with Scrapy in order to automatize file downloading from websites. As a test, I want to download the jpg files from this website. My code is based on the intro tutorial and the Files and Images Pipeline tutorial on the Scrapy…

python scrapy scrapy-pipeline

asked Dec 07 '15 at 02:49

luchonacho

6,759
4
35
52

0

votes

2 answers

How to merge results of nested scrapy requests into a single item?

I have a url that has a bunch of universities. For every university, there is a link to a list of scholarships that is provided by the university. Inside this link (that contains a list of scholarships), there is a link to a detailed information on…

scrapy scrapy-pipeline

asked Jun 10 '23 at 18:06

aashish manandhar

49
6

0

votes

1 answer

Google BigQuery Update is 70x slower then Insert. How to fix?

Im using BigQuery as my DB with Scrapy spider. Below are 2 pipelines to store data into DB. One uses Insert, another Update methods. The Update method is 70 times slower then insert (merely 20 updated records per minute). Update take 3.560 seconds…

python-3.x google-cloud-platform google-bigquery scrapy scrapy-pipeline

asked May 19 '23 at 08:43

Billy Jhon

1,035
15
30

0

votes

1 answer

Scrapy: passing instance variables between pipelines

Does passing spider instance variables between pipelines work? Unfortunately I do not have the code but I'll try to explain as short and clear as possible. Order is the following: Pipeline_1: high priority (@700) Pipeline_2: low priority (@900) In…

python-3.x scrapy scrapy-pipeline

asked Feb 15 '23 at 21:14

The Doctor

17
5

0

votes

0 answers

Scrapy signals not connecting to class methods

I've defined a Crawler class for crawling multiple spiders from script. For spiders, instead of using pipelines, I defined a class, CrawlerPipeline and used signals for connecting methods. In CrawlerPipeline, some methods require to use class…

python web-scraping scrapy scrapy-pipeline

asked Feb 15 '23 at 12:32

rish_hyun

451
1
7
13

0

votes

0 answers

(Scrapy-Redis) Error caught on signal handler:

I have three machines on Azure. One is for redis server. The others are crawlers. But, they showed this message after few hours. Did anyone encounter this situation before? Thanks. 2023-02-04 10:58:14 [scrapy.utils.signal] ERROR: Error caught on…

scrapy web-crawler scrapy-pipeline

asked Feb 04 '23 at 21:56

Ryan

1

0

votes

1 answer

Scrapy item enriching from multiple websites

I implemented the following scenario with python scrapy framework: class MyCustomSpider(scrapy.Spider): def __init__(self, name=None, **kwargs): super().__init__(name, **kwargs) self.days = getattr(self, 'days', None) def…

python scrapy scrapy-pipeline

asked Dec 29 '22 at 15:41

Gandalf

155
1
12

0

votes

1 answer

Scrapy item import error: No module found

I'm new to Scrapy and to python and when I try to import a class from items.py in VS Code I get the following error: Exception has occurred: ModuleNotFoundError No module named 'scraper.items'; 'scraper' is not a package My folder structure: Folder…

python web-scraping scrapy scrapy-pipeline

asked Sep 24 '22 at 01:49

Ego0r

9
1

0

votes

1 answer

List elements retrieved by Xpath in scrapy do not output correctly item by item(for,yield)

I am outputting the URL of the first page of the order results page of an exhibitor extracted from a specific EC site to a csv file, reading it in start_requests, and looping through it with a for statement. Each order result page contains…

python web-scraping scrapy scrapy-pipeline

asked Sep 14 '22 at 06:46

K_MM

35
5

0

votes

1 answer

How to save Scrapy Broad Crawl Results?

Scrapy has a built-in way of persisting results in AWS S3 using the FEEDS setting. but for a broad crawl over different domains this would create a single file, where the results from all domains are saved. how could I save the results of each…

scrapy scrapy-pipeline scrapinghub zyte

asked Sep 09 '22 at 06:19

NightOwl

1,069
3
13
23

0

votes

1 answer

How to yield item from RFPDupeFilter or CustomFiler

I'm using Scrapy to crawl pages from different websites. With every scrapy.Request() I set some meta data which is used to yield an item. It's also possible that my code yields multiple scrapy.Request() for same url however with different…

python-3.x scrapy scrapy-pipeline

asked Sep 01 '22 at 14:10

Kiran Kyle

99
11

Questions tagged [scrapy-pipeline]