Questions tagged [scrapy-pipeline]
218 questions
1
vote
2 answers
Scrapy Pipeline to Parse
I made a pipeline to put scrapy data to my Parse Backend
PARSE = 'api.parse.com'
PORT = 443
However, I can't find the right way to post the data in Parse. Because everytime it creates undefined objects in my Parse DB.
class…

Thomas Simonini
- 93
- 12
1
vote
1 answer
Scrapy spider that get two pictures at same page then names them differently
I'm new both to Python and Scrapy so I'm not sure I've chosen the best method for doing this; but my aim is to get two (or more) different pictures at a page and naming the pictures differently.
How should I set up the pipeline, should I do a…

brrrglund
- 51
- 1
- 8
1
vote
0 answers
Scrapy - Invoke a new crawling process when a crawler finishes
I search for urls - xxx.com/a, xxx.com/b etc as found from two start_urls xxx.com/LISTA and xxx/com/LISTB
Once this crawler has finished I want to also additionally crawl pages xxx.com/x_in_database and xxx.com/y_in_database - whose URLs were…

dowjones123
- 3,695
- 5
- 40
- 83
1
vote
1 answer
Pipeline for item not JSON serializable
I am trying to write output of a scraped xml to json. The scrape fails due to an item not being serializable.
From this question its advised that you need to build a pipeline, answer not provided out of scope for question SO scrapy serializer
So…

sayth
- 6,696
- 12
- 58
- 100
1
vote
1 answer
Cannot download images from website with scrapy
I'm starting with Scrapy in order to automatize file downloading from websites. As a test, I want to download the jpg files from this website. My code is based on the intro tutorial and the Files and Images Pipeline tutorial on the Scrapy…

luchonacho
- 6,759
- 4
- 35
- 52
0
votes
2 answers
How to merge results of nested scrapy requests into a single item?
I have a url that has a bunch of universities. For every university, there is a link to a list of scholarships that is provided by the university. Inside this link (that contains a list of scholarships), there is a link to a detailed information on…

aashish manandhar
- 49
- 6
0
votes
1 answer
Google BigQuery Update is 70x slower then Insert. How to fix?
Im using BigQuery as my DB with Scrapy spider.
Below are 2 pipelines to store data into DB. One uses Insert, another Update methods. The Update method is 70 times slower then insert (merely 20 updated records per minute). Update take 3.560 seconds…

Billy Jhon
- 1,035
- 15
- 30
0
votes
1 answer
Scrapy: passing instance variables between pipelines
Does passing spider instance variables between pipelines work?
Unfortunately I do not have the code but I'll try to explain as short and clear as possible.
Order is the following:
Pipeline_1: high priority (@700)
Pipeline_2: low priority (@900)
In…

The Doctor
- 17
- 5
0
votes
0 answers
Scrapy signals not connecting to class methods
I've defined a Crawler class for crawling multiple spiders from script.
For spiders, instead of using pipelines, I defined a class, CrawlerPipeline and used signals for connecting methods.
In CrawlerPipeline, some methods require to use class…

rish_hyun
- 451
- 1
- 7
- 13
0
votes
0 answers
(Scrapy-Redis) Error caught on signal handler:
I have three machines on Azure. One is for redis server. The others are crawlers. But, they showed this message after few hours. Did anyone encounter this situation before? Thanks.
2023-02-04 10:58:14 [scrapy.utils.signal] ERROR: Error caught on…

Ryan
- 1
0
votes
1 answer
Scrapy item enriching from multiple websites
I implemented the following scenario with python scrapy framework:
class MyCustomSpider(scrapy.Spider):
def __init__(self, name=None, **kwargs):
super().__init__(name, **kwargs)
self.days = getattr(self, 'days', None)
def…

Gandalf
- 155
- 1
- 12
0
votes
1 answer
Scrapy item import error: No module found
I'm new to Scrapy and to python and when I try to import a class from items.py in VS Code I get the following error:
Exception has occurred: ModuleNotFoundError
No module named 'scraper.items'; 'scraper' is not a package
My folder structure:
Folder…

Ego0r
- 9
- 1
0
votes
1 answer
List elements retrieved by Xpath in scrapy do not output correctly item by item(for,yield)
I am outputting the URL of the first page of the order results page of an exhibitor extracted from a specific EC site to a csv file, reading it in start_requests, and looping through it with a for statement.
Each order result page contains…

K_MM
- 35
- 5
0
votes
1 answer
How to save Scrapy Broad Crawl Results?
Scrapy has a built-in way of persisting results in AWS S3 using the FEEDS setting.
but for a broad crawl over different domains this would create a single file, where the results from all domains are saved.
how could I save the results of each…

NightOwl
- 1,069
- 3
- 13
- 23
0
votes
1 answer
How to yield item from RFPDupeFilter or CustomFiler
I'm using Scrapy to crawl pages from different websites. With every scrapy.Request() I set some meta data which is used to yield an item. It's also possible that my code yields multiple scrapy.Request() for same url however with different…

Kiran Kyle
- 99
- 11