Questions tagged [scrapy-pipeline]

218 questions
2
votes
1 answer

Scrapy not calling the assigned pipeline when run from a script

I have a piece of code to test scrapy. My goal is to use scrapy without having to call the scrapy command from the terminal, so I can embed this code somewhere else. The code is the following: from scrapy import Spider from scrapy.selector import…
Santi Peñate-Vera
  • 1,053
  • 4
  • 33
  • 68
2
votes
1 answer

Check if id exists in MongoDB with pymongo and scrapy

I have set up a spider with scrapy that sends data to a MongoDB database. I want to check to see if the id exists so that if it does I can $addToSet on a specific key (otherwise Mongo will reject the insert because the _id already exists). This is…
Eitan
  • 163
  • 1
  • 4
  • 12
1
vote
1 answer

mysql error python -mysql.connector.errors.ProgrammingError: Not enough parameters for the SQL statement

I'm running a class in the pipeline to save my scrapy output to a database but i'm getting an error File "C:\Users\BRAINBOX\Downloads\freeCodeCamp-part-6-main\freeCodeCamp-part6\bookscraper\pipelines.py", line 63, in…
Daniel
  • 43
  • 5
1
vote
2 answers

Why scrapy image pipeline is not downloading images?

I am trying to download all the images from the product gallery. I have tried the mentioned script but somehow I am not able to download the images. I could manage to download the main image which contains an id. The other images from the gallery do…
Raisul Islam
  • 277
  • 2
  • 19
1
vote
1 answer

Get crawl_args from ScrapyRt, and print them in terminal

I have this spider: import scrapy import re import pandas as pd from scrapy_engine.items import PjudItem class updateSeekerSpider(scrapy.Spider): name = "updateSeeker" baseUrl =…
1
vote
0 answers

Scrapy Stop following requests for a specific target

My Scrapy spider has a bunch of independent target links to crawl. def start_requests(self): search_targets = get_search_targets() for search in search_targets: request = get_request(search.contract_type, search.postal_code, 1) …
Danial
  • 362
  • 4
  • 18
1
vote
1 answer

Scrapy returns "None" as result some of the time but not all the time

I have created a Scrapy crawl spider with Rule objects and linkextractor to scrape Amazon bestsellers products but Scrapy sometimes returns None as result even though I know for sure that my xpath expressions are correct. It's so weird that Scrapy…
Avn
  • 31
  • 1
1
vote
0 answers

Scrapy sends multiple Documents to Elastic

we use scrapy to crawl a website where you need to be logged in. There is one website with different pages to crawl. So we have for example 3 different spiders and just need one login. So we tried to use one driver for all spiders and we need to run…
Ramona K.
  • 23
  • 6
1
vote
0 answers

Scrapy upload files to dynamically created directories in S3 based on field

I've been experimenting with Scrapy for sometime now and recently have been trying to upload files (data and images) to an S3 bucket. If the directory is static, it is pretty straightforward and I didn't hit any roadblocks. But what I want to…
1
vote
1 answer

Trying to download files without starting scrapy project but from .py file. Created Custom pipeline within python file, This error comes as metioned

import scrapy from scrapy.crawler import CrawlerProcess from scrapy.pipelines.files import FilesPipeline from urllib.parse import urlparse import os class DatasetItem(scrapy.Item): file_urls = scrapy.Field() files = scrapy.Field() class…
1
vote
0 answers

Save downloaded files with custom names in scrapy

I am new to scrapy.I downloaded some files using the code bellow. I want to change the names of my downloaded files but I don't know how. For example, I want to have a list containing names and use it to rename the files that I downloaded. Any help…
Pito
  • 61
  • 2
1
vote
1 answer

How to run multiple spiders through individual pipelines?

Total noob just getting started with scrapy. In my directory structure I have like this... #FYI: running on Scrapy 2.4.1 WebScraper/ Webscraper/ spiders/ spider.py # (NOTE: contains spider1 and spider2 classes.) items.py …
yeqiuuu
  • 97
  • 1
  • 7
1
vote
0 answers

Exception raised file_path function in Scrapy Pipeline not showed

So, I put a simple exception in an image pipeline like this: class MyImagesPipeline(ImagesPipeline): #Name download version def file_path(self, request, response=None, info=None): raise Exception() print("It get's into…
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
1
vote
1 answer

CsvItemExporter for multiple files in custom item pipeline not exporting all items

I have created an item pipeline as an answer to this question. It is supposed to create a new file for every page according to the page_no value set in the item. This works mostly fine. The problem is with the last csv file generated by the…
Patrick Klein
  • 1,161
  • 3
  • 10
  • 23
1
vote
2 answers

Organizing scraped data based on the url on the data came from

I am creating a Scrapy program to scrape profile pages for numerical data. Each profile has a section that lists the different tags that the user uses. Each of those tags link to a paginated set of pages that shows all the posts made under that tag,…
harada
  • 183
  • 1
  • 9
1 2
3
14 15