0

Let's say I'm scraping a webpage, and I want to select a certain image on the webpage. Just as you can find elements based on their class name, I want to select an image by its src tag. How would I select an image where I already know the src tag?

i.e. I want to select the image whose src tag is:

https://assets.bandsintown.com/images/pin.svg
QHarr
  • 83,427
  • 12
  • 54
  • 101
DiamondJoe12
  • 1,879
  • 7
  • 33
  • 81

4 Answers4

5

You can search by arbitrary attributes; this should work:

soup.findAll("img", {"src" : "https://assets.bandsintown.com/images/pin.svg"})
Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
1

While @Blorgbeard's answer shows the Beautifulsoup approach, using Selenium you can achieve the same using either of the following Locator Strategies:

  • css_selector:

    my_elements = driver.find_elements_by_css_selector("[src=\"https://assets.bandsintown.com/images/pin.svg\"]")
    
  • xpath:

    my_elements = driver.find_elements_by_xpath("//*[@src=\"https://assets.bandsintown.com/images/pin.svg\"]")
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

Using Beautifulsoup you can do it many ways. You can use css selector , you can use regular expression as well.

Css Selector

for item in soup.select('img[src="https://assets.bandsintown.com/images/pin.svg"]'):
   print(item['src'])

Regular Expression with find_all

import re
for item in soup.find_all('img',src=re.compile('https://assets.bandsintown.com/images/pin.svg')):
   print(item['src'])
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • 2
    I'm not sure what the point of using a regex for an exact string match is. Those `.` are supposed to be literal, but regex will interpret that as "any character". – Blorgbeard Apr 29 '19 at 21:13
  • Here regex will search entire string.However this an options if any other future readers needs to search particular value by regex. – KunduK Apr 29 '19 at 21:28
  • Yes, it's useful to know you can use regex that way (I didn't actually know that myself), but it doesn't really make sense with the given example. Something like `src=re.compile('https://.*\.bandsintown\.com/.*/pin\.svg')` might be a better illustration of what you could do with it. – Blorgbeard Apr 29 '19 at 22:09
0

You said a single image by its src value. Use select_one. Less work and you only need an attribute selector.

soup.select_one('[src="https://assets.bandsintown.com/images/pin.svg"]')['src']
QHarr
  • 83,427
  • 12
  • 54
  • 101