Highest Voted 'scrapy-pipeline' Questions

0

votes

1 answer

scrapy mysql returns empty results

So my problem is that the information scraped, won't show up in the database. My spider works fine printing out the information, for example in a .json file. the pipelines.py import sys import MySQLdb import hashlib from scrapy.exceptions import…

asked Aug 07 '16 at 13:49

brian

1
4

0

votes

1 answer

Scrapy image pipeline does not download images

I'm trying to set up image downloading from web pages by using Scrapy Framework and djano-item. I think I have done everything like in doc but after calling scrapy crawl I log looking like this: Scrapy log I can't find there any information on what…

django scrapy scrapy-pipeline

asked Jul 25 '16 at 19:42

Max

1
3

0

votes

1 answer

Scrapy: scrape items from HTML and not from URL

I came across Scrapy with requirement of crawling and scraping both. But according to application requirement I decided not to go with Monolithic approach. Everything should be service based. So I decided to design two services. Get all urls and…

scrapy scrapyd scrapy-pipeline scrapy-shell

asked Jul 22 '16 at 09:31

SangamAngre

809
8
25

0

votes

0 answers

scrapy pipeline can't work

I write few spiders in the same project, which call the same pipeline class. In pipeline code like this: def process_item(self, item, spider): if spider.name == 'spider1': # do something if spider.name == 'spider2': # do…

python scrapy scrapy-pipeline

asked Jul 17 '16 at 14:34

Smith John

1,035
1
10
19

0

votes

2 answers

Scrapy returning blank csv

This is my first time using scrapy and I'm trying to put the information I need into a csv file using the pipeline. Everything seemed to be working fine until I tried to scrape from more than one page and it started to return a blank csv file. I…

csv scrapy scrapy-pipeline

asked Jul 11 '16 at 00:54

Matthew Barnette

3
3

0

votes

1 answer

How to save crawled web pages in memory using scrapy

I am to able crawl over web using following scrapy script import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from lxml import html from scrapy.contrib.spiders import CrawlSpider, Rule from…

web-scraping scrapy scrapy-pipeline

asked Jun 09 '16 at 11:02

nand

517
2
13
29

0

votes

1 answer

The setting ITEM_PIPELINE can't be overridden from terminal

In my settings.py I have: ITEM_PIPELINES = { 'turing.pipelines.InitFieldsNotInitializedPipeline': 299, 'turing.pipelines.SetNoneIfFieldEmptyPipeline': 300, 'turing.pipelines.CheckCategoryPipeline': 301, …

scrapy scrapyd scrapy-pipeline

asked May 27 '16 at 15:48

Luis G. Angel

31
3

0

votes

2 answers

Post Scrapy returned results to Parse

I made a pipeline PARSE = 'api.parse.com' PORT = 443 However, I can't find the right way to post the data in Parse. Because everytime it creates undefined objects in my Parse DB. class Newscrawlbotv01Pipeline(object): def process_item(self,…

parse-platform scrapy scrapy-pipeline

asked Mar 16 '16 at 16:15

Thomas Simonini

93
12

0

votes

1 answer

Up scaling image size downloaded by scrapy image pipeline

I'm trying to overwrite the convert_image method in my pipeline.py from the herited ImagesPipeline class but it does not work as expected. Actully i'm just trying to upscale the image downloaded to my requirements: 700px but the downloaded images…

python-2.7 scrapy scrapy-pipeline

asked Mar 07 '16 at 15:00

Andronaute

379
3
12

0

votes

2 answers

404: Is there any way to avoid being blocked by website while scraping using scrapy

I was trying to use Scrapy to scrape some website about 70k items. but every time after it scraped about 200 items, theis error will pop up for the rest: scrapy] DEBUG: Ignoring response <404 http://www.somewebsite.com/1234>: HTTP status code is not…

python scrapy scrapy-pipeline

asked Feb 02 '16 at 16:45

snowneji

1,086
1
11
25

0

votes

1 answer

scrapy pipeline error using mysql.connector

I am completely lost. here's my pipeline. When I run it, im getting an error that says File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks current.result = callback(current.result, *args, **kw) …

python mysql scrapy scrapy-pipeline

asked Dec 31 '15 at 22:12

Nick

39
9

0

votes

2 answers

Scrapy: use pipeline to replace unwanted non-ASCII code

From Scrapy results, one unwanted nonASCII code \u2013 (aka character(150) or en dash) was in the title, such as u'Director/Senior Director \u2013 Pathology'. I am trying to use pipeline to remove \u2013 with a regular ,. But the following code…

scrapy scrapy-pipeline

asked Oct 18 '15 at 05:22

LearnAWK

549
6
17

0

votes

2 answers

Scrapy Item pipeline for multi spiders

I have 2 spiders and run it here: from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings settings = get_project_settings() process1 =…

python scrapy scrapy-pipeline

asked Aug 14 '15 at 13:27

user3430722

345
1
5
12

-1

votes

1 answer

Scrapy: Use Feed Exports after custom Item Pipeline without custom Feed Exporter class?

My Spider looks like this: class ExampleSpider(scrapy.Spider): name = 'example' custom_settings = { 'ITEM_PIPELINES': {'img_clear.pipelines.DuplicatesPipeline': 100,}, 'FEEDS': { 'feeds/example/tags.csv': { …

python scrapy pipeline scrapy-pipeline

asked Jan 23 '23 at 07:20

Pa7rickStar

1

-1

votes

1 answer

Python scrapy change csv cloumn name

Scrapy is indeed powerful. But it lakes a basic feature. When exporting to csv there is not way to change column name. By default it outputs the field name defined in Item. The item name should be python variables. However, at times we need to…

python scrapy scrapy-pipeline

asked Nov 07 '22 at 12:25

Shahid Karimi

4,096
17
62
104

Questions tagged [scrapy-pipeline]