0

I am getting the URL list but images are not getting downloaded.

import scrapy
from ..items import GoogleItem

class spider(scrapy.Spider):

    name = 'google'
    start_urls = [
        "https://www.google.com/search?q=selena%20gomez&spell=1&ie=UTF-8&gbv=1&tbm=isch&sxsrf=ALeKk01ILeLRLtOpXXVyxa1PQYz38RnbRg%3A1586860191506&ei=n5CVXu2GHPGY4-EPreeA-AM&sa=N&btnG=Search"
    ]

    def parse(self, response):
        item = GoogleItem()
        img_url =[]
        for image in response.xpath('/html/body/table[4]/tr/td/a/img/@src').extract():
            img_url.append(image)

        item["image_urls"]= img_url

        return item

In item i have done this :

import scrapy


class GoogleItem(scrapy.Item):
    images = scrapy.Field()
    image_urls = scrapy.Field()

in setting , this:

ITEM_PIPELINES = {
    'test_scrapy.pipelines.TestScrapyPipeline': 1,
}


IMAGE_STORE ="E:/New folder/image"

I am getting this in my terminal with the links of the image urls :

{'downloader/request_bytes': 404, 'downloader/request_count': 1,

what did I do wrong??

Deepu
  • 31
  • 1
  • 5

1 Answers1

0

Your ITEM_PIPELINE must also include the image downloader pipeline:

ITEM_PIPELINES = {
    # your pipeline that adds `image_urls` field
    'test_scrapy.pipelines.TestScrapyPipeline': 1,
    # scrapy's pipeline taht downloads images
    'scrapy.pipelines.images.ImagesPipeline': 999,
}

For more see official docs in media pipelines: https://docs.scrapy.org/en/latest/topics/media-pipeline.html#enabling-your-media-pipeline

Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
  • I have done that. Now I am getting "ModuleNotFoundError: No module named 'test_scrapy.pipelines.images'; 'test_scrapy.pipelines' is not a package" – Deepu Apr 16 '20 at 06:05
  • I solved that importing problem but the files are still not getting downloaded. – Deepu Apr 16 '20 at 13:35
  • I am only getting the images_urls. and getting the same "downloader/request_bytes': 404" error – Deepu Apr 16 '20 at 13:47