0

I'm testing with urllib.request library to download image files. I tested some urls and it worked. But I encountered a url that leads to fail. But when I entered the url in the browser, it shows the image successfully. The error is HTTP 500 server error. Why is this happening?

import urllib.request

image_url = 'https://cloudinary.images-iherb.com/image/upload/f_auto%2cq_auto:eco/images/sug/sug00972/l/42.jpg' # this is not working. HTTP 500 error
# image_url = 'https://m.media-amazon.com/images/I/81Hxz-Y6imL._AC_SL1500_.jpg' # this is working
folder_name = '.'
savefile = 'test.jpg'
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'MyApp/1.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(image_url, folder_name+"/"+savefile)


Thanks for the answers.

I tested with pycurl, but the returned value from the server was empty string. So I used this code.

import requests

url = 'http://cloudinary.images-iherb.com/image/upload/f_auto%2cq_auto:eco/images/sug/sug00972/l/42.jpg'

try:
    r = requests.get(url, headers={'Referer': 'https://example.com', 'User-Agent': 'Mozilla/5.0', 'Accept': 'image/webp,*/*'})
    r.raise_for_status()
    image_data = r.content
except requests.exceptions.RequestException as e:
    print("Error: %s" % e)
    image_data = None

# Save the image data to a file
if image_data:
    with open('image.jpg', 'wb') as f:
        f.write(image_data)
else:
    print("No image data was downloaded.")
EUN WOO LEE
  • 39
  • 1
  • 5

2 Answers2

0

Maybe the server refuse requests that don't come from browsers, what you're sending is MyApp/1.0 which is not known browser User-Agent string, lets try to send an User-Agent string that corresponds to a known browser like chrome.

import urllib.request

image_url = 'https://cloudinary.images-iherb.com/image/upload/f_auto%2cq_auto:eco/images/sug/sug00972/l/42.jpg'
folder_name = '.'
savefile = 'test.jpg'
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(image_url, folder_name+"/"+savefile)
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
0

The issue is that cloudinary.images-iherb.com is trying to use HTTP/2, while urllib does not support HTTP/2. This thread explores alternatives to using urllib for HTTP/2, but its probably best to just use pycurl:

import pycurl
from io import BytesIO
import certifi

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'http://cloudinary.images-iherb.com/image/upload/f_auto%2cq_auto:eco/images/sug/sug00972/l/42.jpg')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.CAINFO, certifi.where())
c.perform()
c.close()

with open('img.jpg', 'wb') as f:
    f.write(buffer.getvalue())
Michael M.
  • 10,486
  • 9
  • 18
  • 34