Questions tagged [scrapy-pipeline]

218 questions
0
votes
1 answer

scrapy mysql returns empty results

So my problem is that the information scraped, won't show up in the database. My spider works fine printing out the information, for example in a .json file. the pipelines.py import sys import MySQLdb import hashlib from scrapy.exceptions import…
brian
  • 1
  • 4
0
votes
1 answer

Scrapy image pipeline does not download images

I'm trying to set up image downloading from web pages by using Scrapy Framework and djano-item. I think I have done everything like in doc but after calling scrapy crawl I log looking like this: Scrapy log I can't find there any information on what…
Max
  • 1
  • 3
0
votes
1 answer

Scrapy: scrape items from HTML and not from URL

I came across Scrapy with requirement of crawling and scraping both. But according to application requirement I decided not to go with Monolithic approach. Everything should be service based. So I decided to design two services. Get all urls and…
SangamAngre
  • 809
  • 8
  • 25
0
votes
0 answers

scrapy pipeline can't work

I write few spiders in the same project, which call the same pipeline class. In pipeline code like this: def process_item(self, item, spider): if spider.name == 'spider1': # do something if spider.name == 'spider2': # do…
Smith John
  • 1,035
  • 1
  • 10
  • 19
0
votes
2 answers

Scrapy returning blank csv

This is my first time using scrapy and I'm trying to put the information I need into a csv file using the pipeline. Everything seemed to be working fine until I tried to scrape from more than one page and it started to return a blank csv file. I…
0
votes
1 answer

How to save crawled web pages in memory using scrapy

I am to able crawl over web using following scrapy script import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from lxml import html from scrapy.contrib.spiders import CrawlSpider, Rule from…
nand
  • 517
  • 2
  • 13
  • 29
0
votes
1 answer

The setting ITEM_PIPELINE can't be overridden from terminal

In my settings.py I have: ITEM_PIPELINES = { 'turing.pipelines.InitFieldsNotInitializedPipeline': 299, 'turing.pipelines.SetNoneIfFieldEmptyPipeline': 300, 'turing.pipelines.CheckCategoryPipeline': 301, …
0
votes
2 answers

Post Scrapy returned results to Parse

I made a pipeline PARSE = 'api.parse.com' PORT = 443 However, I can't find the right way to post the data in Parse. Because everytime it creates undefined objects in my Parse DB. class Newscrawlbotv01Pipeline(object): def process_item(self,…
0
votes
1 answer

Up scaling image size downloaded by scrapy image pipeline

I'm trying to overwrite the convert_image method in my pipeline.py from the herited ImagesPipeline class but it does not work as expected. Actully i'm just trying to upscale the image downloaded to my requirements: 700px but the downloaded images…
Andronaute
  • 379
  • 3
  • 12
0
votes
2 answers

404: Is there any way to avoid being blocked by website while scraping using scrapy

I was trying to use Scrapy to scrape some website about 70k items. but every time after it scraped about 200 items, theis error will pop up for the rest: scrapy] DEBUG: Ignoring response <404 http://www.somewebsite.com/1234>: HTTP status code is not…
snowneji
  • 1,086
  • 1
  • 11
  • 25
0
votes
1 answer

scrapy pipeline error using mysql.connector

I am completely lost. here's my pipeline. When I run it, im getting an error that says File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks current.result = callback(current.result, *args, **kw) …
Nick
  • 39
  • 9
0
votes
2 answers

Scrapy: use pipeline to replace unwanted non-ASCII code

From Scrapy results, one unwanted nonASCII code \u2013 (aka character(150) or en dash) was in the title, such as u'Director/Senior Director \u2013 Pathology'. I am trying to use pipeline to remove \u2013 with a regular ,. But the following code…
LearnAWK
  • 549
  • 6
  • 17
0
votes
2 answers

Scrapy Item pipeline for multi spiders

I have 2 spiders and run it here: from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings settings = get_project_settings() process1 =…
user3430722
  • 345
  • 1
  • 5
  • 12
-1
votes
1 answer

Scrapy: Use Feed Exports after custom Item Pipeline without custom Feed Exporter class?

My Spider looks like this: class ExampleSpider(scrapy.Spider): name = 'example' custom_settings = { 'ITEM_PIPELINES': {'img_clear.pipelines.DuplicatesPipeline': 100,}, 'FEEDS': { 'feeds/example/tags.csv': { …
-1
votes
1 answer

Python scrapy change csv cloumn name

Scrapy is indeed powerful. But it lakes a basic feature. When exporting to csv there is not way to change column name. By default it outputs the field name defined in Item. The item name should be python variables. However, at times we need to…
Shahid Karimi
  • 4,096
  • 17
  • 62
  • 104
1 2 3
14
15