0

I thought i found a solution using RFC2616 policy but in testing the scraper execution time it seems to still say the same. So i went back to the Default Policy.

I'm directing my image_urls to

'production.pipelines.MyImagesPipeline'

Now i only need to cache the the urls i send to the item image_urls

Now from my understanding you can overwrite the policy by specifying

class DummyPolicy(object):

def should_cache_response(self, response, request):
    if image_url in item['image_urls']:
        return True
    else:
        return False


def is_cached_response_valid(self, cachedresponse, response, request):
    return True

Any code suggestions to getting this working?

Kevin G
  • 2,325
  • 3
  • 16
  • 30

1 Answers1

0

I created a solution by adding the meta dont_cache to certain yield requests :

yield scrapy.Request(url, self.parse, meta={'dont_cache': True})
Kevin G
  • 2,325
  • 3
  • 16
  • 30