Questions tagged [scrapy-pipeline]
218 questions
2
votes
1 answer
Scrapy not calling the assigned pipeline when run from a script
I have a piece of code to test scrapy. My goal is to use scrapy without having to call the scrapy command from the terminal, so I can embed this code somewhere else.
The code is the following:
from scrapy import Spider
from scrapy.selector import…

Santi Peñate-Vera
- 1,053
- 4
- 33
- 68
2
votes
1 answer
Check if id exists in MongoDB with pymongo and scrapy
I have set up a spider with scrapy that sends data to a MongoDB database. I want to check to see if the id exists so that if it does I can $addToSet on a specific key (otherwise Mongo will reject the insert because the _id already exists).
This is…

Eitan
- 163
- 1
- 4
- 12
1
vote
1 answer
mysql error python -mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement
I'm running a class in the pipeline to save my scrapy output to a database but i'm getting an error
File "C:\Users\BRAINBOX\Downloads\freeCodeCamp-part-6-main\freeCodeCamp-part6\bookscraper\pipelines.py", line 63, in…

Daniel
- 43
- 5
1
vote
2 answers
Why scrapy image pipeline is not downloading images?
I am trying to download all the images from the product gallery. I have tried the mentioned script but somehow I am not able to download the images. I could manage to download the main image which contains an id. The other images from the gallery do…

Raisul Islam
- 277
- 2
- 19
1
vote
1 answer
Get crawl_args from ScrapyRt, and print them in terminal
I have this spider:
import scrapy
import re
import pandas as pd
from scrapy_engine.items import PjudItem
class updateSeekerSpider(scrapy.Spider):
name = "updateSeeker"
baseUrl =…
1
vote
0 answers
Scrapy Stop following requests for a specific target
My Scrapy spider has a bunch of independent target links to crawl.
def start_requests(self):
search_targets = get_search_targets()
for search in search_targets:
request = get_request(search.contract_type, search.postal_code, 1)
…

Danial
- 362
- 4
- 18
1
vote
1 answer
Scrapy returns "None" as result some of the time but not all the time
I have created a Scrapy crawl spider with Rule objects and linkextractor to scrape Amazon bestsellers products but Scrapy sometimes returns None as result even though I know for sure that my xpath expressions are correct. It's so weird that Scrapy…

Avn
- 31
- 1
1
vote
0 answers
Scrapy sends multiple Documents to Elastic
we use scrapy to crawl a website where you need to be logged in.
There is one website with different pages to crawl. So we have for example 3 different spiders and just need one login.
So we tried to use one driver for all spiders and we need to run…

Ramona K.
- 23
- 6
1
vote
0 answers
Scrapy upload files to dynamically created directories in S3 based on field
I've been experimenting with Scrapy for sometime now and recently have been trying to upload files (data and images) to an S3 bucket. If the directory is static, it is pretty straightforward and I didn't hit any roadblocks. But what I want to…

scrapy_learner
- 11
- 1
1
vote
1 answer
Trying to download files without starting scrapy project but from .py file. Created Custom pipeline within python file, This error comes as metioned
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.pipelines.files import FilesPipeline
from urllib.parse import urlparse
import os
class DatasetItem(scrapy.Item):
file_urls = scrapy.Field()
files = scrapy.Field()
class…

Laxman Maharjan
- 23
- 3
1
vote
0 answers
Save downloaded files with custom names in scrapy
I am new to scrapy.I downloaded some files using the code bellow. I want to change the names of my downloaded files but I don't know how.
For example, I want to have a list containing names and use it to rename the files that I downloaded.
Any help…

Pito
- 61
- 2
1
vote
1 answer
How to run multiple spiders through individual pipelines?
Total noob just getting started with scrapy.
In my directory structure I have like this...
#FYI: running on Scrapy 2.4.1
WebScraper/
Webscraper/
spiders/
spider.py # (NOTE: contains spider1 and spider2 classes.)
items.py
…

yeqiuuu
- 97
- 1
- 7
1
vote
0 answers
Exception raised file_path function in Scrapy Pipeline not showed
So, I put a simple exception in an image pipeline like this:
class MyImagesPipeline(ImagesPipeline):
#Name download version
def file_path(self, request, response=None, info=None):
raise Exception()
print("It get's into…

Aminah Nuraini
- 18,120
- 8
- 90
- 108
1
vote
1 answer
CsvItemExporter for multiple files in custom item pipeline not exporting all items
I have created an item pipeline as an answer to this question.
It is supposed to create a new file for every page according to the page_no value set in the item. This works mostly fine.
The problem is with the last csv file generated by the…

Patrick Klein
- 1,161
- 3
- 10
- 23
1
vote
2 answers
Organizing scraped data based on the url on the data came from
I am creating a Scrapy program to scrape profile pages for numerical data. Each profile has a section that lists the different tags that the user uses. Each of those tags link to a paginated set of pages that shows all the posts made under that tag,…

harada
- 183
- 1
- 9