Stumbled across this problem that I could not find the solution too, need some help or point me in the right direction.
The tables categorize several movies into multiple genres, for example: Toy Story is in the genres; Children's, Comedy and Animation.
I am trying to find out which genres are most clear-cut and which are so muddled that they tend to be listed among several others for a typical movie.
For each genre in the data set, how many genres are the movies in that genre in, on average. For example, a Action movie is in 1.3 genres and the average Adventure movie is in 2.9 genres, so an Action movie is a better defined genre.
Below are samples of the tables but here is a sql editor with the tables and values in them for reference: https://blazerme.herokuapp.com/queries/new
3 tables:
genre table:
id name
1 Action
2 Adventure
3 Animation
4 Children's
5 Comedy
genre_movies table:
id movie_id genre_id
1 1(Toy Story) 3(Animation)
2 1(Toy Story) 4(Children's)
3 1(Toy Story) 5(Comedy)
4 2(GoldenEye) 1(Action)
5 2(GoldenEye) 2(Adventure)
6 2(GoldenEye) 16(Thriller)
movies table:
id title release_date
1 Toy Story (1995) 1995-01-01
2 GoldenEye (1995) 1995-01-01
3 Four Rooms (1995) 1995-01-01
My best effort so far is I try to Avg over the count of the genre_id. See below:
SELECT
name, AVG(c.count)
FROM
(SELECT
g.name AS name, COUNT(gm.genre_id) AS count
FROM
genres g
INNER JOIN
genres_movies gm ON g.id = gm.genre_id
INNER JOIN
movies m ON m.id = gm.movie_id
GROUP BY
g.name) c
GROUP BY
name
But that simply gives me a count of each genre from the genres_movies
table. I can't seem to figure out how to turn this into the average number of genres for movies in each genre.