1

I'm trying to extracting a dataset with the top 20 movies and each genres and actors. For that I'm trying with the following code:

top250 = ia.get_top250_movies()
limit = 20;
index = 0;
output = []
for item in top250:
    for genre in top250['genres']:
        index += 1;
        if index <= limit:
            print(item['long imdb canonical title'], ": ", genre);
        else:
            break;

I'm getting the following error:

Traceback (most recent call last):
  File "C:/Users/avilares/PycharmProjects/IMDB/IMDB.py", line 21, in <module>
    for genre in top250['genres']:
TypeError: list indices must be integers or slices, not str

I think the object top250 don't have the content genres...

Anyone know how to identify each genre of each movies?

Many thanks!

Pedro Alves
  • 1,004
  • 1
  • 21
  • 47
  • 1
    *"the object top250"* appears to be a *list* of movie objects, so you need to iterate over each one and access its genres. Maybe look into a `set` or `collections.Counter` for storing the unique genres seen. – jonrsharpe Sep 19 '18 at 11:09
  • What is the output if you try printing `top250`? From the error, it seems like it's a list and thus can't be accessed in the way you attempted (which would work with a dict) – rdimaio Sep 19 '18 at 11:15
  • @rdimaio I'm tryint to get the name of the movie and each genre – Pedro Alves Sep 19 '18 at 11:17
  • @PedroAlves Try the code I posted in my answer, let me know if it works for you – rdimaio Sep 19 '18 at 11:32

2 Answers2

2

From the IMDbPY docs:

"It’s possible to retrieve the list of top 250 and bottom 100 movies:"

>>> top = ia.get_top250_movies()
>>> top[0]
<Movie id:0111161[http] title:_The Shawshank Redemption (1994)_>
>>> bottom = ia.get_bottom100_movies()
>>> bottom[0]
<Movie id:4458206[http] title:_Code Name: K.O.Z. (2015)_>

get_top_250_movies() returns a list, thus you can't access the movie's genre directly.

Here's a solution:

# Iterate through the movies in the top 250
for topmovie in top250:
    # First, retrieve the movie object using its ID
    movie = ia.get_movie(topmovie.movieID)
    # Print the movie's genres
    for genre in movie['genres']:
        print(genre)  

Full working code:

import imdb

ia = imdb.IMDb()
top250 = ia.get_top250_movies()

# Iterate through the first 20 movies in the top 250
for movie_count in range(0, 20):
    # First, retrieve the movie object using its ID
    movie = ia.get_movie(top250[movie_count].movieID)
    # Print movie title and genres
    print(movie['title'])
    print(*movie['genres'], sep=", ")

Output:

The Shawshank Redemption
Drama
The Godfather
Crime, Drama
The Godfather: Part II
Crime, Drama
The Dark Knight
Action, Crime, Drama, Thriller
12 Angry Men
Crime, Drama
Schindler's List
Biography, Drama, History
The Lord of the Rings: The Return of the King
Action, Adventure, Drama, Fantasy
Pulp Fiction
Crime, Drama
The Good, the Bad and the Ugly
Western
Fight Club
Drama
The Lord of the Rings: The Fellowship of the Ring
Adventure, Drama, Fantasy
Forrest Gump
Drama, Romance
Star Wars: Episode V - The Empire Strikes Back
Action, Adventure, Fantasy, Sci-Fi
Inception
Action, Adventure, Sci-Fi, Thriller
The Lord of the Rings: The Two Towers
Adventure, Drama, Fantasy
One Flew Over the Cuckoo's Nest
Drama
Goodfellas
Crime, Drama
The Matrix
Action, Sci-Fi
Seven Samurai
Adventure, Drama
City of God
Crime, Drama
rdimaio
  • 325
  • 2
  • 3
  • 15
1

Here is a shorter Pythonic code, the notebook can be accessed here.

Python provides some cleaner way to comprehend our code. In this script, I have used two of such techniques.

Technique-1: List comprehension

A list comprehension is nothing but looping through an iterable and producing a list as an output. Here we can include computation and conditionals also. The other technique i.e. Technique-2: Dictionary comprehension which is very similar to this, you can read about it here.

E.g. Code without list comprehension

numbers = []
for i in range(10):
  numbers.append(i)
print(numbers)

#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Code using list comprehension

numbers = [i for i in range(10)]
print(numbers)

#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Coming to OPs problem, the get_top250_movies() function returns a list of movies with very few details. The exact parameters it returns can be checked like this. As seen in the output the movie details do not contain genres and other details.

from imdb import IMDb
ia = IMDb()
top250Movies = ia.get_top250_movies()
top250Movies[0].items()

#output:
[('rating', 9.2),
 ('title', 'The Shawshank Redemption'),
 ('year', 1994),
 ('votes', 2222548),
 ('top 250 rank', 1),
 ('kind', 'movie'),
 ('canonical title', 'Shawshank Redemption, The'),
 ('long imdb title', 'The Shawshank Redemption (1994)'),
 ('long imdb canonical title', 'Shawshank Redemption, The (1994)'),
 ('smart canonical title', 'Shawshank Redemption, The'),
 ('smart long imdb canonical title', 'Shawshank Redemption, The (1994)')]

However, the get_movie() function returns a lot more information about a movie including the Genres.

We combine the two functions to get the genres of the top 20 movies. First we call the get_top250_movies() which returns a list of top 250 movies with fewer details (we are only interested in getting the movieID). Then we call the get_movie() for each movieID from the top movies list and this returns us the Genres.

Program:

from imdb import IMDb    

#initialize and get top 250 movies; this list of movies returned only has 
#fewer details and doesn't have genres
ia = IMDb()
top250Movies = ia.get_top250_movies()

#TECHNIQUE-1: List comprehension
#get top 20 Movies this way which returns lot of details including genres
top20Movies = [ia.get_movie(movie.movieID) for movie in top250Movies[:20]]

#TECHNIQUE-2: Dictionary comprehension
#expected output as a dictionary of movie titles: movie genres
{movie['title']:movie['genres'] for movie in top20Movies}

Output:

{'12 Angry Men': ['Drama'],
 'Fight Club': ['Drama'],
 'Forrest Gump': ['Drama', 'Romance'],
 'Goodfellas': ['Biography', 'Crime', 'Drama'],
 'Inception': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'],
 "One Flew Over the Cuckoo's Nest": ['Drama'],
 'Pulp Fiction': ['Crime', 'Drama'],
 "Schindler's List": ['Biography', 'Drama', 'History'],
 'Se7en': ['Crime', 'Drama', 'Mystery', 'Thriller'],
 'Seven Samurai': ['Action', 'Adventure', 'Drama'],
 'Star Wars: Episode V - The Empire Strikes Back': ['Action',
  'Adventure',
  'Fantasy',
  'Sci-Fi'],
 'The Dark Knight': ['Action', 'Crime', 'Drama', 'Thriller'],
 'The Godfather': ['Crime', 'Drama'],
 'The Godfather: Part II': ['Crime', 'Drama'],
 'The Good, the Bad and the Ugly': ['Western'],
 'The Lord of the Rings: The Fellowship of the Ring': ['Action',
  'Adventure',
  'Drama',
  'Fantasy'],
 'The Lord of the Rings: The Return of the King': ['Adventure',
  'Drama',
  'Fantasy'],
 'The Lord of the Rings: The Two Towers': ['Adventure', 'Drama', 'Fantasy'],
 'The Matrix': ['Action', 'Sci-Fi'],
 'The Shawshank Redemption': ['Drama']}
Community
  • 1
  • 1
  • While this code may solve the question, [including an explanation](https://meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Brian61354270 Apr 19 '20 at 17:42
  • Hai Brian, Thank you I will edit my answer and try my best to put it in words. I was only consuming for a long time on StackOverflow but this is the first time I ever signed in and posted. Hope people don't find my answers odd. – Tamil Tech Guru Apr 20 '20 at 05:56