I have an issue where my ImagePipeline is downloading some images, while completely ignoring others. I test this by hardcoding the image path by using loader.set_value()
.
Here are two examples of the same image, note that I only write 1 line at a time, not both at same time.
# Test A, Works fine. Scrapy DOES download.
loader.add_value('image_urls', ['http://hemmon.com/house.jpg'])
# Test B, Not working. Scrapy does NOT download.
loader.add_value('image_urls', ['https://media.fastighetsbyran.se/23566167.jpg?Bredd=300'])
Test A get downloaded successfully, Test B is completely ignored. No debug messags, no errors, nothing. I run exactly the same settings.py and no other changes. The image file is the same, I download it from the browser from the Test B path, and then upload it on my own website at Test A path. No changes to the file itself.
Note that I also tried other files on the same host. All of them are ignored.
Here's my settings.py:
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
IMAGES_STORE = os.path.join(BASE_DIR, 'images')
ITEM_PIPELINES = {
'scrapy.pipelines.images.ImagesPipeline': 1,
}
I found this post that seem to have similar issues and it was related to headers. That would explain why I can download the exact same image from one host but not from the other.
EDIT: I created a public repo that reproduces this issue.