1

I am scraping this link : https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds

and get image urls

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json


AMEXurl = ['https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds']
identity = ['filmstrip_container']

html_1 = urlopen(AMEXurl[0])
soup_1 = BeautifulSoup(html_1,'lxml')
address = soup_1.find('div',attrs={"class" : identity[0]})

for x in address.find_all('div', class_ = 'filmstrip-imgContainer'):
    print(x.find('div').get('img'))

but i am getting output as the following :

None
None
None
None
None
None
None

The follwing is the image of the html code from where I am trying to get the image urls :

html code from where image urls are being fetched

This is the section of page from where I'd like to get the urls

image urls

I'd like to get to know if there are any changes to be made in the code so that I get all the image urls.

Ali Baba
  • 85
  • 11

2 Answers2

2

They are dynamically loaded from a script tag. You can easily regex them from the .text of the response. The regex below specifically matches the 7 images you say you want to retrieve and show in the picture.

import requests, re

r = requests.get('https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds').text
p = re.compile(r'imgurl":"(.*?)"')
links = p.findall(r)
print(links)

Regex explanation:

enter image description here


Were you to decide to go with the more expensive selenium you could match with

links = [i['src'] for i in driver.find_all_elements_with_css_selector('.filmstrip-imgContainer img')]
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • I'd be glad if you could help me in getting the Apply Now as well as Learn More urls as well if you don't mind. – Ali Baba Feb 11 '21 at 08:33
1

Try this

import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json
import requests
import re

AMEXurl = ['https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds']
identity = ['filmstrip_container']

r = requests.get(AMEXurl[0])

html_1 = urlopen(AMEXurl[0])

soup_1 = BeautifulSoup(r.content,'lxml')

Extracting All Images

images = soup_1.find_all('img', src=True)

for img in images:
    print(img['src'])

all image tags that display png files.

platinum_card_image=soup_1.find('img', src=re.compile('Platinum_Card\.png$'))
print(platinum_card_image.get('src'))

all image tags that display svg files.

platinum_card_image=soup_1.find_all('img', src=re.compile('\.svg$'))

for img in platinum_card_image:
    print(img.get('src'))

Edit

images_7 = soup_1.select('script')[8].string.split('__REDUX_STATE__ = ')
data = images_7[1]

for d in json.loads(data)["modelData"]['componentFeaturedCards']['cards']:
    print(d['imgurl'])
Samsul Islam
  • 2,581
  • 2
  • 17
  • 23
  • Thanks a lot for your code, but unfortunately I am not able to get all the image urls of all the 7 cards. – Ali Baba Feb 11 '21 at 08:01