2

I try to analyze plans for join query benchmark https://github.com/gregrahn/join-order-benchmark

For example, I execute the following command:

EXPLAIN SELECT *
FROM aka_name AS an,
     cast_info AS ci,
     company_name AS cn,
     keyword AS k,
     movie_companies AS mc,
     movie_keyword AS mk,
     name AS n,
     title AS t
WHERE an.person_id = n.id
  AND n.id = ci.person_id
  AND ci.movie_id = t.id
  AND t.id = mk.movie_id
  AND mk.keyword_id = k.id
  AND t.id = mc.movie_id
  AND mc.company_id = cn.id
  AND an.person_id = ci.person_id
  AND ci.movie_id = mc.movie_id
  AND ci.movie_id = mk.movie_id
  AND mc.movie_id = mk.movie_id;

and as a result, I got the following query plan

                                                       QUERY PLAN                                                                                 [0/1803]
------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1973375.70..22192463.47 rows=22337517790 width=449)
   Hash Cond: (ci.movie_id = t.id)
   ->  Merge Join  (cost=102.03..2617413.84 rows=88800840 width=203)
         Merge Cond: (n.id = an.person_id)
         ->  Merge Join  (cost=0.87..2341713.60 rows=36244344 width=130)
               Merge Cond: (ci.person_id = n.id)
               ->  Index Scan using person_id_cast_info on cast_info ci  (cost=0.44..1714393.60 rows=36244344 width=56)
               ->  Index Scan using name_pkey on name n  (cost=0.43..163847.25 rows=4167379 width=74)
         ->  Materialize  (cost=0.42..69770.80 rows=901343 width=73)
               ->  Index Scan using person_id_aka_name on aka_name an  (cost=0.42..67517.44 rows=901343 width=73)
   ->  Hash  (cost=834975.33..834975.33 rows=24906348 width=246)
         ->  Hash Join  (cost=486218.85..834975.33 rows=24906348 width=246)
               Hash Cond: (mk.movie_id = t.id)
               ->  Hash Join  (cost=4885.82..131552.82 rows=4523930 width=37)
                     Hash Cond: (mk.keyword_id = k.id)
                     ->  Seq Scan on movie_keyword mk  (cost=0.00..69693.30 rows=4523930 width=12)
                     ->  Hash  (cost=2290.70..2290.70 rows=134170 width=25)
                           ->  Seq Scan on keyword k  (cost=0.00..2290.70 rows=134170 width=25)
               ->  Hash  (cost=372278.91..372278.91 rows=2609129 width=209)
                     ->  Hash Join  (cost=141184.56..372278.91 rows=2609129 width=209)
                           Hash Cond: (mc.movie_id = t.id)
                           ->  Hash Join  (cost=11266.43..106748.81 rows=2609129 width=115)
                                 Hash Cond: (mc.company_id = cn.id)
                                 ->  Seq Scan on movie_companies mc  (cost=0.00..44881.29 rows=2609129 width=40)
                                 ->  Hash  (cost=5344.97..5344.97 rows=234997 width=75)
                                       ->  Seq Scan on company_name cn  (cost=0.00..5344.97 rows=234997 width=75)
                           ->  Hash  (cost=61280.28..61280.28 rows=2528228 width=94)
                                 ->  Seq Scan on title t  (cost=0.00..61280.28 rows=2528228 width=94)
 JIT:

as you can see, condition mc.movie_id = mk.movie_id not present in this plan. How and why it possible?

Mihail Salnikov
  • 137
  • 2
  • 11

2 Answers2

4

Look at the last 3 conditions:

AND ci.movie_id = mc.movie_id
AND ci.movie_id = mk.movie_id
AND mc.movie_id = mk.movie_id;

Using movie_id, you are matching table ci with mc, then ci with mk which therefore implies that mc matches mk, and therefore the last condition is superfluous and the planner rightfully ignores it.

JGH
  • 15,928
  • 4
  • 31
  • 48
1

JGH answered this.

This isn't an answer, but this is the proper way to write joins (the INNER isn't strictly necessary but I prefer explicit). FROM table1,table2 WHERE ... is a bad habit you should break immediately. That syntax is not very flexible and can make even simple queries near impossible to read.

SELECT * 
  FROM aka_name AS an
 INNER
  JOIN NAME AS n
    ON an.person_id = n.id
 INNER
  JOIN cast_info AS ci
    ON an.person_id = ci.person_id
   AND n.id = ci.person_id
 INNER
  JOIN title AS T
    ON ci.movie_id = t.id
 INNER
  JOIN movie_keyword AS mk
    ON t.id = mk.movie_id
   AND ci.movie_id = mk.movie_id
 INNER
  JOIN movie_companies AS mc
    ON t.id = movie_companies.movie_id
   AND ci.movie_id = mc.movie_id
   AND mc.movie_id = mk.movie_id
 INNER
  JOIN keyword AS K
    ON mk.keyword_id = k.id
 INNER
  JOIN company_name cn
    ON mc.mc.company_id = cn.id;
Error_2646
  • 2,555
  • 1
  • 10
  • 22