How to download images from BeautifulSoup?

Question

Image https://i.stack.imgur.com/S1BR2.png

import requests from bs4 import BeautifulSoup

r = requests.get("xxxxxxxxx")
soup = BeautifulSoup(r.content)

for link in links:
    if "http" in link.get('src'):
       print link.get('src')

I get the printed URL but don't know how to work with it.

BeautifulSoup is for parsing HTML, `requests` is for making requests over HTTP. Downloading falls into the latter category. `requests.get` that URL and then check the documentation on how to save the body of the response. — Alex Hall, May 11 '16 at 09:25

score 13 · Accepted Answer · edited Oct 06 '17 at 14:26

13

You need to download and write to disk:

import requests
from os.path  import basename

r = requests.get("xxx")
soup = BeautifulSoup(r.content)

for link in links:
    if "http" in link.get('src'):
        lnk = link.get('src')
        with open(basename(lnk), "wb") as f:
            f.write(requests.get(lnk).content)

You can also use a select to filter your tags to only get the ones with http links:

for link in soup.select("img[src^=http]"):
        lnk = link["src"]
        with open(basename(lnk)," wb") as f:
            f.write(requests.get(lnk).content)

edited Oct 06 '17 at 14:26

Ronald Prithiv

109
1
9

answered May 11 '16 at 09:31

Padraic Cunningham

176,452
29
245
321

No worries, using the select approach would be the best approach over all to filter the tags – Padraic Cunningham May 11 '16 at 09:52

score 1 · Answer 2 · edited Feb 23 '21 at 11:46

While the other answers are perfectly correct.

I found it really slow to download and don't know the progress with really high resolution images.

So, made this one.

from bs4 import BeautifulSoup
import requests
import subprocess

url = "https://example.site/page/with/images"
html = requests.get(url).text # get the html
soup = BeautifulSoup(html, "lxml") # give the html to soup

# get all the anchor links with the custom class 
# the element or the class name will change based on your case
imgs = soup.findAll("a", {"class": "envira-gallery-link"})
for img in imgs:
    imgUrl = img['href'] # get the href from the tag
    cmd = [ 'wget', imgUrl ] # just download it using wget.
    subprocess.Popen(cmd) # run the command to download
    # if you don't want to run it parallel;
    # and wait for each image to download just add communicate
    subprocess.Popen(cmd).communicate()

Warning: It won't work on win/mac as it uses wget.

Bonus: You can see the progress of each image if you are not using communicate.

launching a subprocess for wget shouldn't be faster than using a python lib for http. — Corey Goldberg, Jun 03 '18 at 19:33
I can't run this code and I get `FileNotFoundError: [WinError 2] The system cannot find the file specified` related to run `subprocess.Popen(cmd)` or `subprocess.Popen(cmd).communicate() ` — Farhang Amaji, Feb 25 '21 at 17:01

How to download images from BeautifulSoup?

2 Answers2

Linked

Related