3

Stumbled across this problem that I could not find the solution too, need some help or point me in the right direction.

The tables categorize several movies into multiple genres, for example: Toy Story is in the genres; Children's, Comedy and Animation.

I am trying to find out which genres are most clear-cut and which are so muddled that they tend to be listed among several others for a typical movie.

For each genre in the data set, how many genres are the movies in that genre in, on average. For example, a Action movie is in 1.3 genres and the average Adventure movie is in 2.9 genres, so an Action movie is a better defined genre.

Below are samples of the tables but here is a sql editor with the tables and values in them for reference: https://blazerme.herokuapp.com/queries/new

3 tables:

genre table:

id  name
1   Action
2   Adventure
3   Animation
4   Children's
5   Comedy

genre_movies table:

id movie_id     genre_id
1  1(Toy Story) 3(Animation)
2  1(Toy Story) 4(Children's)
3  1(Toy Story) 5(Comedy)
4  2(GoldenEye) 1(Action)
5  2(GoldenEye) 2(Adventure)
6  2(GoldenEye) 16(Thriller)

movies table:

id title             release_date
1  Toy Story (1995)  1995-01-01
2  GoldenEye (1995)  1995-01-01
3  Four Rooms (1995) 1995-01-01

My best effort so far is I try to Avg over the count of the genre_id. See below:

SELECT 
    name, AVG(c.count)
FROM 
    (SELECT
         g.name AS name, COUNT(gm.genre_id) AS count
     FROM
         genres g 
     INNER JOIN
         genres_movies gm ON g.id = gm.genre_id
     INNER JOIN
         movies m ON m.id = gm.movie_id
     GROUP BY
         g.name) c
GROUP BY
    name

But that simply gives me a count of each genre from the genres_movies table. I can't seem to figure out how to turn this into the average number of genres for movies in each genre.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
CharlieE
  • 33
  • 1
  • 4
  • 1
    Could you add more examples and expected output in the question? Can't see any record for Horror Movies. – Darshan Mehta May 17 '17 at 23:02
  • See https://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query – Strawberry May 17 '17 at 23:49
  • @DarshanMehta in this case Horror is id = 11 in the genre table. I replaced Horror with Action to clarify a bit. This question is tough for me to understand as well so let me know if you need more clarity. I added the work I have done so far and also some more examples in the tables. – CharlieE May 18 '17 at 00:02

2 Answers2

3

I solve problems like this by first building the innermost query with the main info that I want, and then work outwards.

The core information needed is:

select movie_id, count(*) from genres_movies group by movie_id

The complete query:

select G.name, Round(avg(numgenres),2) as MuddleFactor
from genres_movies GM

inner join
(select movie_id, count(*) as numGenres
 from genres_movies
 group by movie_id) MNG 
 on MNG.movie_id = GM.movie_id

inner join genres G
on GM.genre_id = G.id

group by G.name
order by MuddleFactor desc
seagulledge
  • 314
  • 1
  • 2
  • 7
-1

You need to do a cartesian product (self-join) on the table so that you show for each movie, every combination of two genres. Then the aggregation/stat is more obvious.

OK. Here's the code.

Select Genres.Name As Genre,
        Count(*)*1.0/Count(Distinct A.Movie_Id) As Average_Genres
    From Genres_Movies A
            Inner Join
        Genres_Movies B
                On A.Movie_Id=B.Movie_Id
            Inner Join
        Genres
                On A.Genre_Id=Genres.Id
    Group By Genres.Name
    Order By 2 Desc
Jeff Winchell
  • 103
  • 1
  • 8
  • Yes, but then I would be violating the agreement I made with InstaCart when taking their test. I gave enough of a hint that someone who can abstract data problems well should be able to readily solve. But to give one more hint, the problem is solvable with a single join and a single group by. – Jeff Winchell Aug 11 '18 at 05:13
  • Who is instacart and why should writing an SQL query violate any agreements? Sounds like a pretty bad excuse – Nico Haase Aug 11 '18 at 08:58
  • I looked again at another one of their tests and it says: "I will not consult/copy code from any source including a website, book, or friend/colleague to complete these tests, though may reference language documentation or use an IDE that has code completion features." So, it doesn't ask me to not post solutions. Given the very high level of InstaCart's Data Scientists judging from their tech blogs, if an applicant needs my tip to answer this SQL question, they won't get far into the process anyway. So I will post the code to his question. – Jeff Winchell Aug 11 '18 at 22:58