Trying to scrape image urls but not able to get it using beautiful soup and python

Question

I am scraping this link : https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds

and get image urls

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json


AMEXurl = ['https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds']
identity = ['filmstrip_container']

html_1 = urlopen(AMEXurl[0])
soup_1 = BeautifulSoup(html_1,'lxml')
address = soup_1.find('div',attrs={"class" : identity[0]})

for x in address.find_all('div', class_ = 'filmstrip-imgContainer'):
    print(x.find('div').get('img'))

but i am getting output as the following :

None
None
None
None
None
None
None

The follwing is the image of the html code from where I am trying to get the image urls :

This is the section of page from where I'd like to get the urls

I'd like to get to know if there are any changes to be made in the code so that I get all the image urls.

score 2 · Accepted Answer · answered Feb 11 '21 at 07:23

They are dynamically loaded from a script tag. You can easily regex them from the .text of the response. The regex below specifically matches the 7 images you say you want to retrieve and show in the picture.

import requests, re

r = requests.get('https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds').text
p = re.compile(r'imgurl":"(.*?)"')
links = p.findall(r)
print(links)

Regex explanation:

Were you to decide to go with the more expensive selenium you could match with

links = [i['src'] for i in driver.find_all_elements_with_css_selector('.filmstrip-imgContainer img')]

I'd be glad if you could help me in getting the Apply Now as well as Learn More urls as well if you don't mind. — Ali Baba, Feb 11 '21 at 08:33

Samsul Islam · Answer 2 · 2021-02-11T10:56:29.077

Try this

import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json
import requests
import re

AMEXurl = ['https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds']
identity = ['filmstrip_container']

r = requests.get(AMEXurl[0])

html_1 = urlopen(AMEXurl[0])

soup_1 = BeautifulSoup(r.content,'lxml')

Extracting All Images

images = soup_1.find_all('img', src=True)

for img in images:
    print(img['src'])

all image tags that display png files.

platinum_card_image=soup_1.find('img', src=re.compile('Platinum_Card\.png$'))
print(platinum_card_image.get('src'))

all image tags that display svg files.

platinum_card_image=soup_1.find_all('img', src=re.compile('\.svg$'))

for img in platinum_card_image:
    print(img.get('src'))

Edit

images_7 = soup_1.select('script')[8].string.split('__REDUX_STATE__ = ')
data = images_7[1]

for d in json.loads(data)["modelData"]['componentFeaturedCards']['cards']:
    print(d['imgurl'])

Thanks a lot for your code, but unfortunately I am not able to get all the image urls of all the 7 cards. — Ali Baba, Feb 11 '21 at 08:01

Trying to scrape image urls but not able to get it using beautiful soup and python

2 Answers2

Extracting All Images

all image tags that display png files.

all image tags that display svg files.

Edit