Highest Voted 'scrapy-pipeline' Questions

2

votes

1 answer

Scrapy not calling the assigned pipeline when run from a script

I have a piece of code to test scrapy. My goal is to use scrapy without having to call the scrapy command from the terminal, so I can embed this code somewhere else. The code is the following: from scrapy import Spider from scrapy.selector import…

asked Feb 21 '17 at 12:00

Santi Peñate-Vera

1,053
4
33
68

2

votes

1 answer

Check if id exists in MongoDB with pymongo and scrapy

I have set up a spider with scrapy that sends data to a MongoDB database. I want to check to see if the id exists so that if it does I can $addToSet on a specific key (otherwise Mongo will reject the insert because the _id already exists). This is…

mongodb python-2.7 scrapy scrapy-pipeline

asked Nov 11 '15 at 17:44

Eitan

163
1
4
12

1

vote

1 answer

mysql error python -mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement

I'm running a class in the pipeline to save my scrapy output to a database but i'm getting an error File "C:\Users\BRAINBOX\Downloads\freeCodeCamp-part-6-main\freeCodeCamp-part6\bookscraper\pipelines.py", line 63, in…

mysql mysql-python scrapy-pipeline

asked May 06 '23 at 13:01

Daniel

43
5

1

vote

2 answers

Why scrapy image pipeline is not downloading images?

I am trying to download all the images from the product gallery. I have tried the mentioned script but somehow I am not able to download the images. I could manage to download the main image which contains an id. The other images from the gallery do…

python-3.x web-scraping scrapy scrapy-pipeline

asked Sep 06 '22 at 18:31

Raisul Islam

277
2
19

1

vote

1 answer

Get crawl_args from ScrapyRt, and print them in terminal

I have this spider: import scrapy import re import pandas as pd from scrapy_engine.items import PjudItem class updateSeekerSpider(scrapy.Spider): name = "updateSeeker" baseUrl =…

api rest scrapy scrapy-pipeline

asked Aug 15 '22 at 19:01

Guillermo Piedrabuena

21
2

1

vote

0 answers

Scrapy Stop following requests for a specific target

My Scrapy spider has a bunch of independent target links to crawl. def start_requests(self): search_targets = get_search_targets() for search in search_targets: request = get_request(search.contract_type, search.postal_code, 1) …

python web-scraping scrapy scrapy-pipeline

asked Apr 23 '22 at 22:10

Danial

362
4
18

1

vote

1 answer

Scrapy returns "None" as result some of the time but not all the time

I have created a Scrapy crawl spider with Rule objects and linkextractor to scrape Amazon bestsellers products but Scrapy sometimes returns None as result even though I know for sure that my xpath expressions are correct. It's so weird that Scrapy…

python web-scraping scrapy scrapy-pipeline

asked Oct 23 '21 at 14:29

Avn

31
1

1

vote

0 answers

Scrapy sends multiple Documents to Elastic

we use scrapy to crawl a website where you need to be logged in. There is one website with different pages to crawl. So we have for example 3 different spiders and just need one login. So we tried to use one driver for all spiders and we need to run…

scrapy scrapy-pipeline

asked Jul 20 '21 at 12:49

Ramona K.

23
6

1

vote

0 answers

Scrapy upload files to dynamically created directories in S3 based on field

I've been experimenting with Scrapy for sometime now and recently have been trying to upload files (data and images) to an S3 bucket. If the directory is static, it is pretty straightforward and I didn't hit any roadblocks. But what I want to…

scrapy scrapy-pipeline

asked Jun 20 '21 at 09:36

scrapy_learner

11
1

1

vote

1 answer

Trying to download files without starting scrapy project but from .py file. Created Custom pipeline within python file, This error comes as metioned

import scrapy from scrapy.crawler import CrawlerProcess from scrapy.pipelines.files import FilesPipeline from urllib.parse import urlparse import os class DatasetItem(scrapy.Item): file_urls = scrapy.Field() files = scrapy.Field() class…

scrapy scrapy-pipeline

asked May 28 '21 at 11:13

Laxman Maharjan

23
3

1

vote

0 answers

Save downloaded files with custom names in scrapy

I am new to scrapy.I downloaded some files using the code bellow. I want to change the names of my downloaded files but I don't know how. For example, I want to have a list containing names and use it to rename the files that I downloaded. Any help…

python web-scraping scrapy scrapy-pipeline

asked Jan 26 '21 at 18:36

Pito

61
2

1

vote

1 answer

How to run multiple spiders through individual pipelines?

Total noob just getting started with scrapy. In my directory structure I have like this... #FYI: running on Scrapy 2.4.1 WebScraper/ Webscraper/ spiders/ spider.py # (NOTE: contains spider1 and spider2 classes.) items.py …

python scrapy scrapy-pipeline

asked Jan 14 '21 at 22:14

yeqiuuu

97
1
7

1

vote

0 answers

Exception raised file_path function in Scrapy Pipeline not showed

So, I put a simple exception in an image pipeline like this: class MyImagesPipeline(ImagesPipeline): #Name download version def file_path(self, request, response=None, info=None): raise Exception() print("It get's into…

python python-3.x web-scraping scrapy scrapy-pipeline

asked Nov 26 '20 at 04:42

Aminah Nuraini

18,120
8
90
108

1

vote

1 answer

CsvItemExporter for multiple files in custom item pipeline not exporting all items

I have created an item pipeline as an answer to this question. It is supposed to create a new file for every page according to the page_no value set in the item. This works mostly fine. The problem is with the last csv file generated by the…

scrapy scrapy-pipeline

asked Nov 25 '20 at 15:35

Patrick Klein

1,161
3
10
23

1

vote

2 answers

Organizing scraped data based on the url on the data came from

I am creating a Scrapy program to scrape profile pages for numerical data. Each profile has a section that lists the different tags that the user uses. Each of those tags link to a paginated set of pages that shows all the posts made under that tag,…

python scrapy scrapy-pipeline

asked Nov 20 '20 at 01:52

harada

183
1
9

Questions tagged [scrapy-pipeline]