2

EDIT:
This is not duplicate of older version of scrapy . Scrapy has changed recently in years and current version is 0.24

Scrapy has evolved dramatically over the few years of development. Most of the answer of stackoverflow regarding scrapy are outdated.

I am using scrapy 0.24.4 and want to download images in a separate manner for each link. Right now, using scrapy documentation, I am able to download image but they only reside in only one folder.

I am using the below code, so it gets saved in separate folder as per each url, but unable to achieve it. This code don't even run , it resides in pipelines.py . Only the default behavior of images pipeline gets executed i.e it downloads every url in item['image_urls'] .

pipelines.py

import scrapy
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
import urlparse 
import urllib

class RecursiveScrapPipeline(object):

    """Custom Image to save in Structured folder """

    def process_item(self, item, spider):
        #item currently is image name
        image_guid =  item

        return "%s/full/%s.jpg"% (id,image_guid)
    #this should work , exactly as per documentation 



    def get_media_requests(self, item, info):

        for image_url in item['image_urls']:
            yield scrapy.Request(image_url,meta={'id':item['Property_name']})

Am I on correct track? What could possibly be the solution ?

igauravsehrawat
  • 3,696
  • 3
  • 33
  • 46
  • possible duplicate of [Scrapy : create folder structure out of downloaded images based on the url from which images are being downloaded](http://stackoverflow.com/questions/12956653/scrapy-create-folder-structure-out-of-downloaded-images-based-on-the-url-from) – Christian Strempfer Oct 22 '14 at 06:42
  • 1
    @Chris that is an older version . 2 years old . i am using the recent version also mentioned in the question . – igauravsehrawat Oct 22 '14 at 07:45

1 Answers1

2

I'm not actually sure what you are trying to do in this pipeline, but something seems very wrong. It is possible that I completely misunderstood what you were trying to do, so in this case please elaborate more on the details of your implementation.

In the meantime, here are some things that could be problematic:

  1. You should inherit from the ImagesPipeline, if your goal is to alter the default behaviour of this pipeline. You should also make sure your pipeline is enabled in the settings.py.

  2. The method process_item() should return an Item() object or raise a DropItem() exception, but you are returning a string? And to make it worse, it is a string created by implicitly casting an item object to string? This makes no sense in this context. Even less if you consider you should not override that method in the ImagesPipeline.

  3. You have no implementation if item_completed(), which is the method called when all image requests for a single item have completed (either finished downloading, or failed for some reason). From there, you can see the path the image has been downloaded too, and move it if necessary.

Please read the official documentation for Downloading Item images on the official documentation for further clarification.

bosnjak
  • 8,424
  • 2
  • 21
  • 47