How can I check if Scrapy Image Pipeline is using a proxy to download images?

Question

I have built a scraper and would like to download some images using a proxy in scrapy. I don't know if it is really downloading through the proxy. Reponse Headers don't show the IP. Furthermore, if I change the IP to a random IP, it still downloads the Image. How can I ensure it is using a proxy to download the images? Thanks

Pipelines.py

import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem

class MyImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        meta = {'proxy': 'http://23.323.44.22:11111/'}
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url,meta=meta)

Settings.py

ITEM_PIPELINES = {'myproject.pipelines.MyImagesPipeline': 1}

score 1 · Accepted Answer · answered Mar 25 '17 at 18:26

1

If the download works with a random IP, the proxy is not used.

The Scrapy Doc says: "You can also set the meta key proxy per-request, to a value like http://some_proxy_server:port. Maybe the '/' at the end of your proxy url confuses Scrapy?

To make sure that a proxy is used, I would use Wireshark and filters on the proxy IP. If you see traffic for it's IP, it is likely that it is used.

answered Mar 25 '17 at 18:26

rfelten

181
6

Thanks, I will try this and report. – zer02 Mar 30 '17 at 14:47

How can I check if Scrapy Image Pipeline is using a proxy to download images?

1 Answers1