0

I want to scrape imdb first 100 movies's img , is seems sucessfully ,but it give me wrong url

imdb web site :https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating

url = 'https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating'

response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')

movie_data = soup.findAll('div', attrs={'class': 'lister-item mode-advanced'})

for store in movie_data:
    imageDiv = store.find('div', {'class': 'lister-item-image float-left'})
    img = imageDiv.a.img['src']

and img always get wrong url

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
TangPing
  • 1
  • 1

1 Answers1

2

When doing web scraping, you need to look at the HTML to see what it's doing. All of those images load the fake "movie cell" image to start with. That's the src attribute in their <img> tag, and that's exactly what you're fetching.

The actual movie thumbnail is stored in a loadlate attribute, which gets fetched by Javascript after the page loads. This allows it to load more quickly, and fill in the images later.

So, use this instead:

    img = imageDiv.a.img('loadlate')
SuperStormer
  • 4,997
  • 5
  • 25
  • 35
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30