1

I have a weird problem with postgresql. Somewhy planner thinks that accessing data via foreign key index is very slow and uses sequental scan.

There's a simplified table structure:

create table flight
(
    id        bigint not null constraint flight_pkey        primary key,
    parent_id bigint not null constraint flight_parent_fkey references flight
);

create table passenger
(
    id        bigint not null constraint passenger_pkey     primary key,
    flight_id bigint not null constraint pax_flight_fkey    references flight
);

create index pax_flight_id on passenger (flight_id);

Table passenger has about 14 million rows and table flight has around 60 thousands.

The problem is, I have a QueryDSL query with lots of optional conditions, and one of them filters passengers by their flights.

QPassenger qPassenger = QPassenger.passenger;
Long flightId = 123456;
...

BooleanBuilder predicate = new BooleanBuilder();
predicate.and(qPassenger.flight.id.eq(flightId));

...

And when I try to fetch all the passengers that fit these optional conditions it generates query like one below, and it executes for the whole thirty seconds. That's terrifying. It somewhy uses sequental scan and hash join over all the passenger table.

select
    passenger0_.id        as id1_13_,
    passenger0_.flight_id as flight_14_13_
from
    passenger passenger0_ cross join flights flight1_
where
    passenger0_.flight_id = 123456
 or flight1_.parent_id = 123456

However, after a day of solution-seeking I found that using primary key of the flight table will make postgres use primary key index:

select
    passenger0_.id        as id1_13_,
    passenger0_.flight_id as flight_14_13_
from
    passenger passenger0_ cross join flights flight1_
where
    flight1_.id = 123456            --  ←this line!
 or flight1_.parent_id = 123456

Unfortunately, I cannot filter rows I received by hands, as it will result in ~13 millions of rows, opposite to ~300 passengers per flight when flightId is set.

➥ So, my question is: is there any way to tell QueryDSL/Hibernate to use the specific column for this condition? I.e. flight.id, not passenger.flight_id.

Or, another question: what's wrong with my PostgreSQL planner and how can I fix it?


UPD Planner's plans:

  • Good query that uses primary key in WHERE-condition:
EXPLAIN ANALYZE
SELECT *
  FROM passenger p JOIN flights f ON p.flight_id = f.id
 WHERE (f.id = 123456
     OR f.parent_id = 123456);
QUERY PLAN
Nested Loop  (cost=3.66..1759.26 rows=310 width=951) (actual time=0.044..0.242 rows=184 loops=1)
  ->  Bitmap Heap Scan on flights f  (cost=3.10..9.78 rows=6 width=240) (actual time=0.024..0.029 rows=4 loops=1)
        Recheck Cond: ((id = 123456) OR (parent_id = 123456))
        Heap Blocks: exact=4
        ->  BitmapOr  (cost=3.10..3.10 rows=6 width=0) (actual time=0.018..0.018 rows=0 loops=1)
              ->  Bitmap Index Scan on flights_pkey  (cost=0.00..1.53 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
                    Index Cond: (id = 123456)
              ->  Bitmap Index Scan on flt_parent_id_index  (cost=0.00..1.56 rows=5 width=0) (actual time=0.009..0.009 rows=3 loops=1)
                    Index Cond: (parent_id = 123456)
  ->  Index Scan using passenger_flight_id on passenger p  (cost=0.56..286.73 rows=485 width=711) (actual time=0.005..0.023 rows=46 loops=4)
        Index Cond: (flight_id = f.id)
Planning Time: 0.566 ms
Execution Time: 0.321 ms
  • Bad query that uses foreign key in WHERE-condition:
EXPLAIN ANALYZE
SELECT *
  FROM passenger p JOIN flights f ON p.flight_id = f.id
 WHERE (p.flight_id = 123456
     OR f.parent_id = 123456);
QUERY PLAN
Gather  (cost=34194.96..3461993.92 rows=734 width=951) (actual time=79878.815..80711.129 rows=184 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Hash Join  (cost=33194.96..3460920.52 rows=306 width=951) (actual time=71883.434..72044.345 rows=61 loops=3)
        Hash Cond: (p.flight_id = f.id)
        Join Filter: ((p.flight_id = 123456) OR (f.parent_id = 123456))
        Rows Removed by Join Filter: 11206038
        ->  Parallel Seq Scan on passenger p  (cost=0.00..827052.82 rows=14216282 width=711) (actual time=20.021..27298.757 rows=11206100 loops=3)
        ->  Parallel Hash  (cost=20891.65..20891.65 rows=275065 width=240) (actual time=1284.916..1284.917 rows=219796 loops=3)
              Buckets: 8192  Batches: 128  Memory Usage: 1248kB
              ->  Parallel Seq Scan on flights f  (cost=0.00..20891.65 rows=275065 width=240) (actual time=2.134..966.560 rows=219796 loops=3)
Planning Time: 0.605 ms
Execution Time: 80711.774 ms
Xobotun
  • 1,121
  • 1
  • 18
  • 29
  • What if you would set an index on the foreign key column? – Simon Martinelli Oct 22 '19 at 14:37
  • @SimonMartinelli, unfortunately, it is already there. But planner does not want to use it, so I'm looking for a way to use this index. And forbidding sequential scan also does not work, planner behaves as if the index does not exist at all. Recreating the index also did not help. Thanks for the help, though. – Xobotun Oct 22 '19 at 18:21

0 Answers0