Highest Voted 'scrapy-pipeline' Questions

0

votes

1 answer

python scrapy pipeline suddenly doesn't work

It is very weird, I wrote the scrapy code with its pipeline and crawled huge amount of data, it always worked well. Today when i re-run the same code, it suddenly doesn't work at all. Here are the details: My Spider - base_url_spider.py import…

python-3.x scrapy scrapy-pipeline

asked Apr 24 '17 at 22:42

Cherry Wu

3,844
9
43
63

0

votes

1 answer

Scrapy Pipeline unknown number of results

I have a scrapy spider which gets the start_urls from a MySQL database. When it scrapes each page it comes back with an unknown number of links, meaning it could have zero links or up to 10 links from each page that it scrapes. Because that number…

python scrapy scrapy-pipeline

asked Apr 14 '17 at 15:47

SDailey

17
3

0

votes

1 answer

Scrapy Pipeline SQL Syntax error

I have a spider that grabs URL's from a MySQL DB and uses those URL's as the start_urls to scrape, which in turn grabs any number of new links from the scraped pages. When I set the pipeline to INSERT both the start_url and new scraped url to a new…

python scrapy scrapy-pipeline

asked Apr 13 '17 at 16:40

SDailey

17
3

0

votes

1 answer

How can I check if Scrapy Image Pipeline is using a proxy to download images?

I have built a scraper and would like to download some images using a proxy in scrapy. I don't know if it is really downloading through the proxy. Reponse Headers don't show the IP. Furthermore, if I change the IP to a random IP, it still downloads…

python proxy scrapy web-crawler scrapy-pipeline

asked Mar 24 '17 at 13:13

zer02

3,963
4
31
66

0

votes

1 answer

Scrapy Regex Custom Pipeline

This is my Scrapy custom regex pipeline code: for p in item['code']: for search_type, pattern in RegEx.regexp.iteritems(): s = re.findall(pattern, p) if s: return item else: …

python scrapy scrapy-pipeline

asked Jan 30 '17 at 18:32

Stuart

11

0

votes

0 answers

How to download images from dynamically generated hashed url using scrapy?

I am using scrapy to download images from website https://pixabay.com/. My working code is as below- from scrapy.spiders import Spider from scrapy.selector import Selector from scrapy.http import Request from website.imageItems import…

scrapy scrapy-pipeline

asked Jan 24 '17 at 03:16

Bit_hunter

789
2
8
25

0

votes

2 answers

Retreive http return code from ImagesPipeline (or MediaPipeline) in scrapy

I have a working spider scraping image URLs and placing them in image_urls field of a scrapy.Item. I have a custom pipeline that inherits from ImagesPipeline. When a specific URL returns a non-200 http response code (like say a 401 error). For…

python scrapy scrapy-pipeline scrapy-shell

asked Jan 19 '17 at 19:57

hAcKnRoCk

1,118
3
16
30

0

votes

2 answers

Crawl website from list of values using scrapy

I have a list of NPIs which I want to scrape the names of the providers for from npidb.org The NPI values are stored in a csv file. I am able to do it manually by pasting the URLs in the code. However, I am unable to figure out how to do it if I…

python scrapy scrapy-pipeline

asked Jan 18 '17 at 17:19

infinite-rotations

97
1
8

0

votes

0 answers

Scrapy - Pipe data to database if keyword match found

Put simply, I'm scraping web data in Scrapy. I need to analyse the scraped data for keywords / regex and if matched, pipeline the data to database. If not found, drop. My question is: should/can I do this from within Scrapy and if so do you have…

python scrapy scrapy-pipeline

asked Jan 18 '17 at 16:04

Stuart

11

0

votes

1 answer

Scrapy Get returned Value from pipeline

I'm trying to get returned value from pipeline. I'm using yield generator to generate item. And this is my code. def get_or_create(model): model_class = type(model) created = False try: obj =…

python django scrapy scrapy-pipeline

asked Jan 06 '17 at 17:41

Murat Kaya

1,281
3
28
52

0

votes

1 answer

Below POST Method is not working in scrapy

I have tried with headers, cookies, Formdata and body too, but i got 401 and 500 status code. In this site First Page is in GET method & gives HTML response and further pages are in POST method & gives JSON response. But these status codes arrives…

python-2.7 scrapy scrapy-pipeline scrapy-shell

asked Dec 10 '16 at 10:46

Vimal Annamalai

139
1
2
12

0

votes

1 answer

Scrapy Only Cache Images

I thought i found a solution using RFC2616 policy but in testing the scraper execution time it seems to still say the same. So i went back to the Default Policy. I'm directing my image_urls to 'production.pipelines.MyImagesPipeline' Now i only…

python python-2.7 scrapy scrapy-pipeline

asked Nov 29 '16 at 23:52

Kevin G

2,325
3
16
30

0

votes

2 answers

Can't get value from Scrapy stats dictionary

I have this pipeline in my scrapy where I need to get an info from the Scrapy stats class MyPipeline(object): def __init__(self, stats): self.stats = stats @classmethod def from_crawler(cls, crawler): return…

python dictionary scrapy keyerror scrapy-pipeline

asked Nov 17 '16 at 22:38

Aminah Nuraini

18,120
8
90
108

0

votes

0 answers

Debugging Scrapy Item pipeline

I am trying to persist the scraped items into MySQL running on localhost. Even tough the spider crawls the sites and scrapes items in the intended way, my pipeline object for persisting does not work - it does not store items into the…

python mysql scrapy scrapy-pipeline

asked Nov 06 '16 at 23:16

zmg

1
4

0

votes

2 answers

Use scrapy as an item generator

I have an existing script (main.py) that requires data to be scraped. I started a scrapy project for retrieving this data. Now, is there any way main.py can retrieve the data from scrapy as an Item generator, rather than persisting data using the…

python scrapy scrapy-pipeline

asked Sep 15 '16 at 09:53

bsuire

1,383
2
18
27

Questions tagged [scrapy-pipeline]