Problem with files downloaded by using python

Question

I am trying to download some jpgs from the site and save them on my hard drive, but when I do that I can't open files due to it's problem with formatting, all of these files have also 115kb for some reason.

I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
    print('no images')
else:
    for i in range(len(elem)):
        link = elem[i].get('src')
        if link != None:
            plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
            for chunk in res.iter_content(100000):
                plik.write(chunk)
            plik.close()
            print('downloaded %s' % os.path.basename(link))

Solution (in the 'for i...' loop):

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
    print('no images')
else:
    for i in range(len(elem)):
        link = url + elem[i].get('src')
        res2 = requests.get(link)
        res2.raise_for_status()
        if link != None:
            plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
            for chunk in res.iter_content(100000):
                plik.write(chunk)
            plik.close()
            print('downloaded %s' % os.path.basename(link))

That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor? — elixenide, Mar 24 '19 at 20:43
Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function? — Ensien, Mar 24 '19 at 20:48
I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing). — elixenide, Mar 24 '19 at 20:51

score 0 · Accepted Answer · answered Mar 24 '19 at 20:56

0

After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.

At the moment it appears that you are trying to continue reading from the initial response.

Note: For all links and anchors, browsers make further http request

answered Mar 24 '19 at 20:56

Gro

1,613
1
13
19

You were right. I edited the post with the right code. Thank you! – Ensien Mar 24 '19 at 21:10

Problem with files downloaded by using python

1 Answers1