Using the IMDB database, I have tables actor
, casts
, and movie
, and I need to select actors with a Kevin Bacon number of 2. I thought this should do it, but I'm getting 0 rows returned. What is my error?
select fname, lname
from actor join casts on pid=actor.id
where actor.id in (
select a3.id --actors who have a kb number of 2
from casts c3 join actor a3 on c3.pid=a3.id,
(
(select c1.mid --actors who have a kb number of 1
from (casts c1 join actor a1 on c1.pid=a1.id), (casts c2 join actor a2 on c2.pid=a2.id)
where c1.mid=c2.mid and a2.fname='Kevin' and a2.lname='Bacon')
)Level1 where c3.mid=Level1.mid
)
and actor.id not in (select a4.id --and only a kb number of 2
from (casts c4 join actor a4 on c4.pid=a4.id), (casts c5 join actor a5 on c5.pid=a5.id)
where c4.mid=c5.mid and a5.fname='Kevin' and a5.lname='Bacon');
Here are the table schemas:
ACTOR (id, fname, lname, gender)
MOVIE (id, name, year)
CASTS (pid, mid, role)
mid
is a foreign key to a movie id and pid
is a foreign key to actor id.
Note that restrictions on the question prohibit me from using temp tables or recursion: the query should be done with subselects.
I also tried
select count(distinct pid) from casts join actor on pid=actor.id where mid in (
select mid from casts where pid in (
select distinct pid from casts where mid in (
select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))
and pid not in
(select distinct pid from casts where mid in (
select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon'));
which also seems like it should work, but it's not finishing.
I finally managed to get some working code:
select count(distinct pid) from casts where mid in (
select mid from casts where pid in (
select distinct pid from casts where mid in (
select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))
and pid not in
(select distinct pid from casts where mid in (
select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon'));
The subqueries return sensible answers, at least. But it's taking forever. Each subquery took under 30 seconds, but together they're taking 6 minutes and counting. Why?
Note: This was given to me as homework. To avoid any semblence of academic misconduct on my part, I'd prefer if people didn't post complete/exact solutions, but rather pointed out general things that I'm doing wrong/make general suggestions as to how I should go about this.