25

Using the IMDB database, I have tables actor, casts, and movie, and I need to select actors with a Kevin Bacon number of 2. I thought this should do it, but I'm getting 0 rows returned. What is my error?

select fname, lname
from actor join casts on pid=actor.id
where actor.id in (
    select a3.id  --actors who have a kb number of 2
    from casts c3 join actor a3 on c3.pid=a3.id,
    (
     (select c1.mid --actors who have a kb number of 1
     from (casts c1 join actor a1 on c1.pid=a1.id), (casts c2 join actor a2 on c2.pid=a2.id)
     where c1.mid=c2.mid and a2.fname='Kevin' and a2.lname='Bacon')
    )Level1 where c3.mid=Level1.mid
)
and actor.id not in (select a4.id --and only a kb number of 2
     from (casts c4 join actor a4 on c4.pid=a4.id), (casts c5 join actor a5 on c5.pid=a5.id)
     where c4.mid=c5.mid and a5.fname='Kevin' and a5.lname='Bacon');

Here are the table schemas:

ACTOR (id, fname, lname, gender)
MOVIE (id, name, year)
CASTS (pid, mid, role)

mid is a foreign key to a movie id and pid is a foreign key to actor id.

Note that restrictions on the question prohibit me from using temp tables or recursion: the query should be done with subselects.


I also tried

select count(distinct pid) from casts join actor on pid=actor.id where mid in (
    select mid from casts where pid in (
        select distinct pid from casts where mid in (
            select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))

and pid not in  
    (select distinct pid from casts where mid in (
        select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon'));

which also seems like it should work, but it's not finishing.


I finally managed to get some working code:

select count(distinct pid) from casts where mid in (
    select mid from casts where pid in (
        select distinct pid from casts where mid in (
            select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))

and pid not in  
    (select distinct pid from casts where mid in (
        select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon'));

The subqueries return sensible answers, at least. But it's taking forever. Each subquery took under 30 seconds, but together they're taking 6 minutes and counting. Why?


Note: This was given to me as homework. To avoid any semblence of academic misconduct on my part, I'd prefer if people didn't post complete/exact solutions, but rather pointed out general things that I'm doing wrong/make general suggestions as to how I should go about this.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
Colleen
  • 23,899
  • 12
  • 45
  • 75
  • 2
    @AbeMiessler http://sqlfiddle.com/ – scottm Feb 03 '12 at 22:57
  • make a test case: create TEMP tables with data – cetver Feb 03 '12 at 23:04
  • aww, and here I just figured out how to do it with a recursive CTE too (this was interesting in the context of dealing with cyclical graphs)... Why aren't you allowed to use them? Although, aren't some of the new non-SQL databases supposed to do better at this type of thing - I though somebody'd made a network-graph database? Are you allowed to use CTEs at all? – Clockwork-Muse Feb 03 '12 at 23:52
  • Thanks! I didn't know that. And I realized (after much pain), that I can use "not in" as long as the columns are the same. I ended up with a query similar to Martin's, but with not in instead of except that works and runs in 22 seconds :) – Colleen Feb 04 '12 at 04:08

2 Answers2

9

To give a sketch of a solution rather than an exact solution I would use this general approach

SELECT *
FROM   ACTOR
WHERE  id IN (
SELECT id 
       /* ... of actors that have worked on a film worked 
         on by actors that have worked on a KB film*/
EXCEPT
SELECT id
 /* ... of all actors that have worked on a KB film
         including KB himself*/ )

Also as you are not allowed to use recursive CTEs anyway here's an answer using those.

WITH RecursiveCTE
     AS (SELECT C.pid,
                C.mid,
                0 as Level
         FROM   CASTS C
                JOIN ACTOR A
                  ON A.id = C.pid
         WHERE  A.fname = 'Kevin'
                and A.lname = 'Bacon'
         UNION ALL
         SELECT c1.pid,
                c2.mid,
                R.Level + 1
         FROM   RecursiveCTE R
                JOIN CASTS c1
                  ON c1.mid = R.mid
                     AND R.Level < 2
                JOIN CASTS c2
                  ON c1.pid = c2.pid)
SELECT *
FROM   ACTOR
WHERE  id IN (SELECT pid
              FROM   RecursiveCTE
              GROUP  BY pid
              HAVING MIN(Level) = 2)  
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • looking for more general ideas re: what I'm doing wrong/what I should do, but I'll run your query and vote you up if the answer matches the answer I (hopefully, eventually) get. – Colleen Feb 04 '12 at 00:13
  • Hmmm this is still an actual answer, which I think he wanted to avoid. – ErikE Feb 04 '12 at 00:13
  • This is an actual answer, but I posted that comment after he posted the answer, so it's ok. – Colleen Feb 04 '12 at 00:15
  • @Colleen - Well the general idea is to find actors that have worked on films worked on by those that have worked in KB films then subtract all actors that have worked on a KB film (and thus have a bacon number of 0 or 1) – Martin Smith Feb 04 '12 at 00:16
  • also it seems right. I want to figure out why my way isn't working (or is just taking forever), though. – Colleen Feb 04 '12 at 00:16
  • Yeah, that makes sense... that's what I'm trying to do. So now I'm focusing on the second query I give and trying to make that work... so: I get the movies that kevin bacon has been in. Then I get the distinct pids where mid is in those movies. (So those should be kb=1 people, plus kb himself, right?). Then, I get all the movies those people have been in. Then I get all of the people who have been in those movies (so that should be kb=1 and kb=2). Then I subtract kb=1. Why doesn't it worrrkkkkk??? – Colleen Feb 04 '12 at 00:17
  • @Colleen - I think that query might work. When I try it on very small simple tables it works anyway. – Martin Smith Feb 04 '12 at 00:33
  • I think it does, it just takes forever and I'm not sure why it takes SO much longer than yours. The subselects, subtracted, give the same answer. – Colleen Feb 04 '12 at 00:39
  • @Colleen - Maybe you are missing some useful index(es) – Martin Smith Feb 04 '12 at 00:52
0

Is this correct?

select distinct name from M_Cast c1 join M_Cast c2 on c2.mid=c1.mid
join M_cast  c3 on c2.pid=c3.pid join M_Cast c4 on c4.mid=c3.mid 
join person p on p.pid= trim(c4.pid) where  trim(c1.pid)="nm0451321" and p.pid!="nm0451321"

except

select distinct trim(name) from M_Cast c1 join M_Cast c2 on c1.mid=c2.mid
join person p on p.pid= trim(c2.pid) where  trim(c1.pid)="nm0451321" and p.pid!="nm0451321"

nm0451321--->id of kevin bacon

Selim Yildiz
  • 5,254
  • 6
  • 18
  • 28