Questions tagged [scrapy-pipeline]
218 questions
0
votes
1 answer
My Scrapy item['img_urls'] doesn't download the file
I'm currently working on a student's data scientist project which consist of building a fish recognition system by picture. We will use tensorflow to make sense from data & scrapy to find a massive amount of data (fish picture & his scientific…

Jordan Guillonneau
- 13
- 4
0
votes
0 answers
Scrapy best practice: Connect to database in crawler or in pipeline?
I am scraping a main page that has a list of items. Within my pipeline I connect to a database to store the items. My next task is to go to each individual item page and scrape comments. I need to connect to the database again to see if I've already…

Learning C
- 679
- 10
- 27
0
votes
1 answer
Why my pipeline return previous modified items?
I created a pipeline to save each item on ElasticSearch. On this pipeline I check if item already exist to check if administrator override some field, to force a reindex (got this field and save/override it on new item)
class…

magexcustomer
- 35
- 6
0
votes
1 answer
Scrapy Image Pipeline: How to rename images?
I've a spider which fetches both the data and images. I want to rename the images with the respective 'title' which i'm fetching.
Following is my code:
spider1.py
from imageToFileSystemCheck.items import ImagetofilesystemcheckItem
import…

Sagar Singh Verma
- 55
- 11
0
votes
0 answers
scrapy - download image without compressing the picture
I am trying to download some images without compression.
e.g. http://p1.pstatp.com/origin/433c000159def0223671
this pic is about 2.0MB
when i download it using scrapy it's only 120Kb .
settings.py
BOT_NAME = 'toutiao'
SPIDER_MODULES =…

咸蛋超人
- 45
- 1
- 6
0
votes
2 answers
Scraping multiple tables and storing each table header as rows in csv
I'm trying to scrape multiple tables which have a table name stored under a h3 tag. There is Columns of data I can scrape no problem and when I feed the next url I can append this data to the csv file.
The problem I can't solve is to get the table…

tomoc4
- 337
- 2
- 10
- 29
0
votes
0 answers
Scrapy log HTTP errors to database or pipeline
I'm trying to get a full picture of my crawls in a database (mySQL).
So I need any errback stuff to get logged to the database.
Is it possible to pass errback to the pipelines?
I currently have it set up like so:
Reponse -> (Item) -> Pipeline
When…

Akustik
- 43
- 8
0
votes
1 answer
How to scrape tens of thousands urls every night using scrapy
I am using scrapy to scrape some big brands to import the sale data for my site. Currently I am using
DOWNLOAD_DELAY = 1.5
CONCURRENT_REQUESTS_PER_DOMAIN = 16
CONCURRENT_REQUESTS_PER_IP = 16
I am using Item loader to specify css/xpath rules and…

mmrs151
- 3,924
- 2
- 34
- 38
0
votes
0 answers
skip downloading but not other tasks in scrapy pipeline
Is there a way I can skip downloading a webpage but still have other parts of pipeline after it execute?
Currently, I read a file of json objects in start_requests, each json object has a website URL and other data fields. if a website URL is not…

comiventor
- 3,922
- 5
- 50
- 77
0
votes
0 answers
Scrapy data not being written to database
The spider and pipeline are running fine but the database still shows empty set.
Here is the pipeline code. I am using python 2.7 and mysql database
from twisted.enterprise import adbapi
class MysqlWriter(object):
def __init__(self):
…

Sonali Singh
- 9
- 3
0
votes
2 answers
Scrapy and python Reponse object has no attribute 'xpath'
EDIT 2 - Because my folders got mixed up with names I chose, I accidentally posted the wrong code. Please see below for accurate code of each file for the correct folder containing all my files for this.
Settings
# -*- coding: utf-8 -*-
# Scrapy…

mlclm
- 725
- 6
- 16
- 38
0
votes
2 answers
Scrapy and celery `update_state`
I have the following setup (Docker):
Celery linked to Flask setup which runs the Scrapy spider
Flask setup (obviously)
Flask setup gets request for Scrapy -> fire up worker to do some work
Now I wish to update the original flask setup on the…

WiseStrawberry
- 317
- 1
- 4
- 14
0
votes
1 answer
Python + Scrapy renaming downloaded images
IMPORTANT NOTE: all the answers available at the moment on stackoverflow are for previous versions of Scrapy and don't work with the latest version of scrapy 1.4
Totally new to scrapy and python, I am trying to scrape some pages and download the…

mlclm
- 725
- 6
- 16
- 38
0
votes
1 answer
Scrapy merge output on a field
I have a Scrapy output like this:
[{'gender': 'women',
'name': 'NEW IN: CLOTHING',
'products': [{'name': 'Free People Cocoon Multi Way Neck Top',
'price': {'currency': 'GBP',
'outlet': '40.0',
…
0
votes
1 answer
Data crawling using scrapy package in python
I'm trying to get some data with images from website(IMDB) using 'scrapy' package.
If there is a image_URL in div class, then i'm able to crawl data with movie poster. However, If not, my code doesn't work properly. It skipped some data associate…