This website is dynamic, so using bs4
will not work here (see page source). I would recommend you using selenium
to grab page source and pass it in soup object. Here is the sample code to do this:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.empireonline.com/movies/features/best-movies-2'
chrome_driver_path = 'chromedriver'
chrome_options = Options()
chrome_options.add_argument('--headless')
webdriver = webdriver.Chrome(ChromeDriverManager().install())
with webdriver as driver:
# Set timeout time
wait = WebDriverWait(driver, 10)
# Retrieve url in headless browser
driver.get(url)
html = driver.page_source
driver.close()
soup = BeautifulSoup(html, 'html.parser')
titles = soup.find_all(name='h3', class_='jsx-2692754980')
titles = [i.text for i in titles if i.text is not None]
print(titles)
imgs = soup.find('div', class_='jsx-3821216435').find_all('img')
print(imgs)
The results for titles and imgs are:
titles -- ['100) Stand By Me', '99) Raging Bull', '98) Amelie', '97) Titanic', '96) Good Will Hunting', '95) Arrival', '94) Lost In Translation' ... ]
imgs --- [<img alt="Stand By Me" class="jsx-952983560 loading" data-src="//cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit" src="" title=""/>, <img alt="Raging Bull" class="jsx-952983560 loading" data-src="//cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit" src="" title=""/>, ... ]
Note that you need to pip install selenium
then download chromedriver
and put it in the same directory with script.