Questions tagged [scrapy-pipeline]
218 questions
0
votes
1 answer
scrapy mysql returns empty results
So my problem is that the information scraped, won't show up in the database.
My spider works fine printing out the information, for example in a .json file.
the pipelines.py
import sys
import MySQLdb
import hashlib
from scrapy.exceptions import…

brian
- 1
- 4
0
votes
1 answer
Scrapy image pipeline does not download images
I'm trying to set up image downloading from web pages by using Scrapy Framework and djano-item. I think I have done everything like in doc
but after calling scrapy crawl I log looking like this:
Scrapy log
I can't find there any information on what…

Max
- 1
- 3
0
votes
1 answer
Scrapy: scrape items from HTML and not from URL
I came across Scrapy with requirement of crawling and scraping both. But according to application requirement I decided not to go with Monolithic approach. Everything should be service based. So I decided to design two services.
Get all urls and…

SangamAngre
- 809
- 8
- 25
0
votes
0 answers
scrapy pipeline can't work
I write few spiders in the same project, which call the same pipeline class.
In pipeline code like this:
def process_item(self, item, spider):
if spider.name == 'spider1':
# do something
if spider.name == 'spider2':
# do…

Smith John
- 1,035
- 1
- 10
- 19
0
votes
2 answers
Scrapy returning blank csv
This is my first time using scrapy and I'm trying to put the information I need into a csv file using the pipeline. Everything seemed to be working fine until I tried to scrape from more than one page and it started to return a blank csv file. I…

Matthew Barnette
- 3
- 3
0
votes
1 answer
How to save crawled web pages in memory using scrapy
I am to able crawl over web using following scrapy script
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from lxml import html
from scrapy.contrib.spiders import CrawlSpider, Rule
from…

nand
- 517
- 2
- 13
- 29
0
votes
1 answer
The setting ITEM_PIPELINE can't be overridden from terminal
In my settings.py I have:
ITEM_PIPELINES = {
'turing.pipelines.InitFieldsNotInitializedPipeline': 299,
'turing.pipelines.SetNoneIfFieldEmptyPipeline': 300,
'turing.pipelines.CheckCategoryPipeline': 301,
…

Luis G. Angel
- 31
- 3
0
votes
2 answers
Post Scrapy returned results to Parse
I made a pipeline
PARSE = 'api.parse.com'
PORT = 443
However, I can't find the right way to post the data in Parse. Because everytime it creates undefined objects in my Parse DB.
class Newscrawlbotv01Pipeline(object):
def process_item(self,…

Thomas Simonini
- 93
- 12
0
votes
1 answer
Up scaling image size downloaded by scrapy image pipeline
I'm trying to overwrite the convert_image method in my pipeline.py from the herited ImagesPipeline class but it does not work as expected.
Actully i'm just trying to upscale the image downloaded to my requirements: 700px but the downloaded images…

Andronaute
- 379
- 3
- 12
0
votes
2 answers
404: Is there any way to avoid being blocked by website while scraping using scrapy
I was trying to use Scrapy to scrape some website about 70k items. but every time after it scraped about 200 items, theis error will pop up for the rest:
scrapy] DEBUG: Ignoring response <404 http://www.somewebsite.com/1234>: HTTP status code is not…

snowneji
- 1,086
- 1
- 11
- 25
0
votes
1 answer
scrapy pipeline error using mysql.connector
I am completely lost. here's my pipeline. When I run it, im getting an error that says
File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
…

Nick
- 39
- 9
0
votes
2 answers
Scrapy: use pipeline to replace unwanted non-ASCII code
From Scrapy results, one unwanted nonASCII code \u2013 (aka character(150) or en dash) was in the title, such as u'Director/Senior Director \u2013 Pathology'. I am trying to use pipeline to remove \u2013 with a regular ,. But the following code…

LearnAWK
- 549
- 6
- 17
0
votes
2 answers
Scrapy Item pipeline for multi spiders
I have 2 spiders and run it here:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
settings = get_project_settings()
process1 =…

user3430722
- 345
- 1
- 5
- 12
-1
votes
1 answer
Scrapy: Use Feed Exports after custom Item Pipeline without custom Feed Exporter class?
My Spider looks like this:
class ExampleSpider(scrapy.Spider):
name = 'example'
custom_settings = {
'ITEM_PIPELINES': {'img_clear.pipelines.DuplicatesPipeline': 100,},
'FEEDS': {
'feeds/example/tags.csv': {
…
-1
votes
1 answer
Python scrapy change csv cloumn name
Scrapy is indeed powerful. But it lakes a basic feature. When exporting to csv there is not way to change column name. By default it outputs the field name defined in Item. The item name should be python variables.
However, at times we need to…

Shahid Karimi
- 4,096
- 17
- 62
- 104