-3

The first query q1 gives desired result. But query q2 gives excessive large no of movies for a pair of actor and director.

SELECT * FROM 
   (SELECT  pc.PID AS actorID, pc.Name AS Actor, pd.PID AS directorID,pd.Name AS 
    Director,COUNT(DISTINCT m.MID) count_movie FROM Movie m 
    JOIN 
    M_Cast mc ON m.MID = Trim(mc.MID) 
   JOIN  
   M_Director md ON  m.MID = md.MID
   JOIN  
   Person pc ON Trim(mc.PID) = pc.PID
   JOIN 
   Person pd ON md.PID = pd.PID
   WHERE pd.Name LIKE '%Yash%' GROUP BY pc.Name) AS q1

   JOIN

   (SELECT  pc.PID AS actorID,pc.Name Actor,pd.PID AS directorID,pd.Name AS Director, COUNT(DISTINCT 
   m.MID) count_movie FROM Movie m 
   JOIN 
   M_Cast mc ON  m.MID = TRIM(mc.MID ) 
   JOIN  
   M_Director md ON m.MID = md.MID 
   JOIN 
   Person pc ON TRIM(mc.PID) = pc.PID 
   JOIN 
   Person pd  ON md.PID = pd.PID
   WHERE pd.Name NOT LIKE '%Yash%' GROUP BY pc.PID) AS q2
   ON q1.Actor = q2.Actor ORDER BY q1.count_movie DESC
        actorID     Actor          directorID   Director      count_movie  actorID      Actor      directorID    Director       count_movie
   0    nm0707271   Jagdish Raj    nm0007181    Yash Chopra   11           nm0707271    Jagdish Raj nm0474806   Gulshan Kumar   98
   1    nm0471443   Manmohan       nm0007181    Yash Chopra   10           nm0471443    Manmohan    nm0695153   T. Prakash Rao  39
   2    nm0407002   Iftekhar       nm0007181    Yash Chopra    9           nm0407002    Iftekhar    nm1065099   C.P. Dixit      93
   3    nm0534501   Madan Puri     nm0007181    Yash Chopra    8           nm0534501    Madan Puri  nm0619050   Hiren Nag       94

The movie_count on extreme right seems to be unreasonably high. It should be less than 11 for the first row if it has to match the answer.

Here i have not used the above WHERE condition just to show where my query is wrong. Even if i use the Where condition

where q1.count_movie > q2.count_movie

The result which i get is

        Name            count
    0   Sanjeev Kumar   3
    1   Sanjeev Kumar   3
    2   Sanjeev Kumar   3
    3   Sanjeev Kumar   3
    4   Ashok Kumar     2

The below output is the correct result. It will give 245 rows. I am showing only 6 rows.

                Name            count
       0        Jagdish Raj     11
       1        Manmohan        10
       2        Iftekhar        9
       3        Shashi Kapoor   7
       4        Rakhee Gulzar   5
       5        Waheeda Rehman  5

The tables given are

  • Movie(MID,title,year)
  • M_Cast(MID,PID)
  • Person(PID,Name,Gender)
  • M_Director(MID,PID)

I am new in sql and having a lot of trouble in debugging the code. Kindly help me to understand the logic.

akki_buoy
  • 11
  • 4
  • 2
    Please provide the expected [MRE](https://stackoverflow.com/help/minimal-reproducible-example). Show where the intermediate results deviate from the ones you expect. We should be able to paste a single block of your code into file, run it, and reproduce your problem. – Prune Oct 16 '20 at 06:05
  • "My question landed like a dead duck so I'm back with the same thing again, hope you like it this time" is not a great strategy. Try and *improve your question* so we can better help you. – tadman Oct 16 '20 at 06:05
  • 1
    In particular, you should *know* which results are correct before you post here. If necessary, scroll through IMDB yourself and verify the movie counts. "I think it's wrong" is not a Stack Overflow question: that attempts to shift the testing burden to us. – Prune Oct 16 '20 at 06:07
  • I don't understand the purpose of the query. Your title says "Find all the actors that made more movies with Yash Chopra than any other director". So, I'd expect a list of actors. Okay, you show the movie count for Yash Chopra movies (and unnecessarily the Yash Chopra's name and ID). But why are you showing another director? Is this to say "For instance this is a director the actor had less movies with"? Why? Do you want to list all the actor's other directors? Or pick sample ones? What is the purpose? – Thorsten Kettner Oct 16 '20 at 07:41
  • And what DBMS are you using? – Thorsten Kettner Oct 16 '20 at 07:41
  • @ThorstenKettner no i am showing director name just to debug. i am using sqlite. – akki_buoy Oct 16 '20 at 07:59
  • @ThorstenKettner the correct result and my output are shown as two columns namely actor and count. The detailed output with director name is just to debug – akki_buoy Oct 16 '20 at 08:00

1 Answers1

1

The problem with your second part of the query is that you sum up all movies of all other directors instead of counting movies per director.

I see two approaches for the task:

  1. Per actor compare Yash Chopra's movie count to all others. (Which would requiere WHERE yash_chopra_count > ANY (...), but SQLite doesn't feature the ANY keyword.) Or compare Yash Chopra's movie count to the maximum movie count of the others. (Which does work in SQLite.)
  2. Rank directors per actor by movie count and see whether the best ranked actor is Yash Chopra. (Which should be done with ROW_COUNT).

Before we start, let's check, if we got everything right: The data model allows multiple directors per movie. But we can still just count. If an actor made three movies with Yash Chopra and three with I. S. Johar, we don't care whether these are six separate movies or maybe just three where Yash Chopra was I. S. Johar's assistent director. We see in my example for the actor there is another director with at least as many movies as with Yash Chopra, so we don't want to select this actor. The m:n relation is hence no hindrance. It doesn't make a difference.

Now that we made sure we can simply count movies per actor and director let's see the two approaches:

Compare Yash Chopra count with maximum other count

As mentioned, we must aggregate other directors twice (once for the count per director, once for the maximum count), because SQLite lacks the ANY operator. This isn't a big deal though when we use a CTE for readability.

with director_actor as
(
  select
    c.pid as pid_actor,
    d.pid as pid_director,
    count(*) as movie_count
  from m_director d
  join m_cast c on c.mid = d.mid
  group by c.pid, d.pid
)
select pid_actor, movie_count
from director_actor
where pid_director = (select pid from person where name = 'Yash Chopra')
and movie_count >
(
  select coalesce(max(movie_count), 0)
  from director_actor other
  where other.pid_actor = director_actor.pid_actor
  and other.pid_director <> director_actor.pid_director
);

Rank directors per actor by movie count

What steps must we undertake to get the actors in question?

  1. Count movies per actor and director.
  2. Rank the directors per actor. The director with most movies gets rank #1. If there is a tie, then we rank the other director better then Yash Chopra, so Yash Chopra only get's rank #1, if he has more movies than the other directors.
  3. Keep the actors where the best ranked director is Yash Chopra.

For the ranking I'd use ROW_NUMBER.

with ranked as
(
  select
    c.pid as pid_actor,
    d.pid as pid_director,
    yc.pid as pid_yash_chopra,
    count(*) as movie_count,
    row_number() over (partition by c.pid order by count(*) desc, d.pid = yc.pid) as rn
  from m_director d
  join m_cast c on c.mid = d.mid
  left join person yc on yc.pid = d.pid and name = 'Yash Chopra'
  group by c.pid, d.pid, yc.pid
)
select pid_actor, movie_count
from ranked
where rn = 1 and pid_director = pid_yash_chopra;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Hi firstly thanks for such a descriptive answer. I have gone through the first solution wch seem convenient for me. However it returns only 17 rows as answer but actual answer is 245 rows. I tried to debug and found upto the part where u take the yash chopra comparison is correct and returns 430 rows but the last part where u compare movie_count > (Select max....) ,it is not giving correct comparison i guess.Can you please say where it might went wrong i tried trimming and debugging extensively. Thank you – akki_buoy Oct 17 '20 at 04:02
  • The second answer using rank showing the following error OperationalError: near "(": syntax error – akki_buoy Oct 17 '20 at 05:54
  • The second query is syntactically correct, so either you made an error when copying it or you are using an old SQLite version not yet supporting one of the features. This demo shows that there is no syntax error in the query: https://dbfiddle.uk/?rdbms=sqlite_3.27&fiddle=84d2c917dd2e1ab4a7039fd7dc2f40e5 – Thorsten Kettner Oct 17 '20 at 07:33
  • As to the first query: One problem I see now is that the subquery returns null for actors that exclusively played in Yash Chopra movies. I have corrected this by adding `COALESCE`. If this still doesn't work, then an example would help. Find one actor that must be in the results, but isn't. Show their data. You can use my dbfiddle and fill the tables there with such an actor, so I would be able to investigate. – Thorsten Kettner Oct 17 '20 at 07:37
  • With some changes in the code like TRIM and Joining Based on name instead of ID the answer matched.However i have accepted your answer. It was a nice explaination you solved many doubts of mine.Thank you – akki_buoy Oct 17 '20 at 10:57