-1

I am working on a project where I need to create a movie database. I have created my database and imported the links from IMDB that redirect you to the webpage. I would like to add also, the main image/thumbnail of each movie so that I can use then the csv in Power BI.enter image description here However, I did not manage to do it:

I have tried this:

import requests
from bs4 import BeautifulSoup
import numpy as np

images = []
for i in df_database_url['Url Film']:
    r = requests.get(i)
    soup = BeautifulSoup(r.content, "html.parser")
    images.append(image_url)

But my goal is to have a column that includes the thumbnail for each movie.

Mr K.
  • 1,064
  • 3
  • 19
  • 22

1 Answers1

0

Assuming that i is an imdb movie url (the kind that starts with https://www.imdb.com/title), you can target the script tag that seems to contain a lot of the main information for the movie - you can get that with

# import json
image_url = json.loads(soup.select_one('script[type="application/ld+json"]').text)['image']

or, if we're more cautious:

# import json

scCont = [s.text for s in soup.select('script[type="application/ld+json"]') if '"image"' in s.text]
if scCont:
    try:
        scCont = json.loads(scCont[0])
        if 'image' not in scCont: 
            image_url = None
            print('No image found for', i)
        else: image_url = scCont['image']
    except Exception as e: 
        image_url = None
        print('Could not parse movie info for', i, '\n', str(e))
else:
    image_url = None
    print('Could not find script with movie info for', i)

(and you can get the trailer thumbnail with scCont['trailer']['thumbnailUrl'])

This way, instead of raising an error if anything on the path to the expected info is unavailable, it will just add image_url as None; if you want it to halt and raise error in such cases, use the first version.


and then after the loop you can add in the column with something like

df_database_url['image_urls'] = images

(you probably know that...)

Driftr95
  • 4,572
  • 2
  • 9
  • 21