Questions tagged [scrapy-pipeline]

218 questions
3
votes
2 answers

Scrapy: how to send the items to the site via the api

Now my spiders are sending data to my site in this way: def parse_product(response, **cb_kwargs): item = {} item[url] = response.url data = { "source_id": 505, "token": f"{API_TOKEN}", "products": [item] …
m_sasha
  • 239
  • 1
  • 7
3
votes
1 answer

Use Django's models in a Scrapy project (in the pipeline)

This has been asked before but the answer that always comes up is to use DjangoItem. However it states on it's github that: often not a good choice for a write intensive applications (such as a web crawler) ... may not scale well This is the crux…
Max Smith
  • 925
  • 1
  • 14
  • 25
3
votes
1 answer

Pass file_name argument to pipeline for csv export in scrapy

I need scrapy to take an argument (-a FILE_NAME="stuff") from the command line and apply that to the file created in my CSVWriterPipeLine in pipelines.py file. (The reason I went with pipeline.py was that the built in exporter was repeating data…
Josh Usre
  • 674
  • 1
  • 12
  • 35
2
votes
1 answer

How to send data from one pipeline to another in Scrapy

Hello I have two pipelines, the first one to download photos: class ModelsPipeline(ImagesPipeline): def get_media_requests(self, item, info): for image_url in item['image_urls']: yield scrapy.Request(image_url) def…
Tlaloc-ES
  • 4,825
  • 7
  • 38
  • 84
2
votes
1 answer

scrapy image pipeline filename unsing other crawled info

Is there any way to name a crawled image with other info(text) that we get with the spider? for example in this case I want images with the article title and article published date that I got in spider: spider file # lines of code def parse(self,…
Violet
  • 33
  • 7
2
votes
1 answer

Right way to scrape this noisy price tag

Given a
containing a price with a lot of noise: Price 1\u00a0500\u00a0000 EUR and you want only the pure amount (1500000), what is the best way to implement this in Scrapy? I tried to combine regex: il.add_css('price', 'div.price_tag::text',…
szeta
  • 589
  • 1
  • 5
  • 21
2
votes
3 answers

Scrapy: How to access the custom, CLI passed settings from the __init__() method of a spider class?

I need to access the custom settings passed from the CLI using: -s SETTING_NAME="SETTING_VAL" from the __init__() method of the spider class. get_project_settings() allows me to access only the static settings. The docs explain how you can access…
Nikolay Shindarov
  • 1,616
  • 2
  • 18
  • 25
2
votes
1 answer

Python + Scrapy: Issues running "ImagesPipeline" when running crawler from script

I'm brand new to Python so I apologize if there's a dumb mistake here...I've been scouring the web for days, looking at similar issues and combing through Scrapy docs and nothing seems to really resolve this for me... I have a Scrapy project which…
tycrone
  • 21
  • 2
2
votes
1 answer

Scrapinghub plugs my results in the log and not in item

I have a functioning spider project to extract urls content (no css). I crawled several set of data and stored them in a series of .csv files. Now I try to set it up to work on Scrapinghub in order to go for a long run scraping. So far, I am able to…
Freddy
  • 73
  • 8
2
votes
2 answers

In which file/place should Scrapy process the data?

Scrapy has several points/places where allowed processing scraped data: spider, items and spider middlewares. But I don't understand where I should do it right. I can process some scraped data in all these places. Could you explain to me differences…
Sergey
  • 57
  • 5
2
votes
0 answers

scrapy.pipeline ImagePipeline func file_path . When I return str

The function file_path parameter in imagepipiline responds. When I return str directly, I can successfully download the image. If response.meta.get('file_name') is used, the download will fail. It is also a string. Why can't the variable be…
S.DZ
  • 29
  • 1
2
votes
3 answers

.json export formating in Scrapy

Just a quick question about json export formatting in Scrapy. My exported file looks like this. {"pages": {"title": "x", "text": "x", "tags": "x", "url": "x"}} {"pages": {"title": "x", "text": "x", "tags": "x", "url": "x"}} {"pages": {"title": "x",…
2
votes
2 answers

Scrapy error: 'Pipeline' object has no attribute 'exporter'

I made a scraper and am using this tutorial to export using a pipeline. When I run scrapy crawl [myspider] I see the objects flashing by in my terminal, but after each it gives the error 'PostPipeline' object has no attribute 'exporter'. My…
Teresa
  • 353
  • 1
  • 5
  • 27
2
votes
1 answer

Return image contents by Scrapy-Splash

I'm using Scrapy-Splash requests to get a rendered screenshot of a page, but I also need the images on that page. I use the pipelines to download those images, but I was thinking - does this not make two requests for the same image? Once when Splash…
Akustik
  • 43
  • 8
2
votes
3 answers

Scrapy Pipeline doesn't insert into MySQL

I'm trying to build a small app for a university project with Scrapy. The spider is scraping the items, but my pipeline is not inserting data into mysql database. In order to test whether the pipeline is not working or the pymysl implementation is…
1
2
3
14 15