Scrapy Only Cache Images

Question

I thought i found a solution using RFC2616 policy but in testing the scraper execution time it seems to still say the same. So i went back to the Default Policy.

I'm directing my image_urls to

'production.pipelines.MyImagesPipeline'

Now i only need to cache the the urls i send to the item image_urls

Now from my understanding you can overwrite the policy by specifying

class DummyPolicy(object):

def should_cache_response(self, response, request):
    if image_url in item['image_urls']:
        return True
    else:
        return False


def is_cached_response_valid(self, cachedresponse, response, request):
    return True

Any code suggestions to getting this working?

score 0 · Answer 1 · answered Dec 01 '16 at 09:57

0

I created a solution by adding the meta dont_cache to certain yield requests :

yield scrapy.Request(url, self.parse, meta={'dont_cache': True})

answered Dec 01 '16 at 09:57

Kevin G

2,325
3
16
30

Scrapy Only Cache Images

1 Answers1