Explain Plan Fails to Use Index

Question

We have a query written in Oracle that has been adapted to run in PostgreSQL. The logic is identical, as are the contents of the tables, but the Oracle query runs significantly faster. The table structures match across the two databases.

In evaluating the explain plan, there is a major difference in that Oracle leverages an index (primary key, actually), whereas the PostgreSQL query does not.

I have attempted to simplify the structures as follows to demonstrate the issue.

create table stage.work_order_operations (
  work_order text not null,
  sequence_number integer not null,
  status text,
  remaining_hours numeric,
  complete_date date,
  constraint work_order_operations_pk primary key (work_order, sequence_number)
);

create index work_order_operations_ix1 on stage.work_order_master (complete_date);

create table stage.work_order_master (
  work_order text not null,
  status_code text not null,
  part_number text not null,
  quantity integer,
  constraint work_order_master_pk primary key (work_order)
);

create index work_order_master_ix1 on stage.work_order_master (status_code);

The query is as follows:

select
  op.*
from
  stage.work_order_master wo
  join stage.work_order_operations op on
    wo.work_order = op.work_order
where
  wo.status_code <= '90'

In this case, the limit status_code <= '90' severely limits the amount of records from the wo table, from tens of millions to around 15,000 records. I would have expected the query to leverage the limited data set and use the work_order_operations_pk index (key), but it's not:

Hash Join  (cost=19.93..40.52 rows=207 width=200)'
  Hash Cond: (op.work_order = wo.work_order)'
  ->  Seq Scan on work_order_operations op  (cost=0.00..16.20 rows=620 width=100)'
  ->  Hash  (cost=17.34..17.34 rows=207 width=100)'
        ->  Bitmap Heap Scan on work_order_master wo  (cost=4.75..17.34 rows=207 width=100)'
              Recheck Cond: (status_code <= '90'::text)'
              ->  Bitmap Index Scan on work_order_master_ix1  (cost=0.00..4.70 rows=207 width=0)'
                    Index Cond: (status_code <= '90'::text)'

I have several questions:

Is it possible the explain plan is not consistent with the execution plan, and PostgreSQL really is using the index?
Is there a way to see the actual execution plan on PostgreSQL after the query runs for real?
Is there a way to force the database to invoke an index, similar to hints in Oracle (although I don't know if there is such a hint -- I know there is one that can suppress the use of an index)
Does anyone have any other thoughts to make this query run quicker?

Regarding question #1, by all appearances the index is not being used, so I think I know the answer to that one, but I wanted to be sure.

-- EDIT 10/19/15 --

Thanks to all for the feedback and suggestions. For what it's worth, here are some statistics and a possible stop-gap.

I have created an interim table that represents the work order operations associated with open work orders (status < 91). This takes the population of records from over 40,000,000 to around 160,000.

Method 1: Standard join -- takes 283 seconds

truncate table stage.open_work_order_operations;

insert into stage.open_work_order_operations
SELECT 
  op.*  
FROM 
  stage.work_order_master wo
  join stage.work_order_operations op on
    wo.work_order = op.work_order
WHERE 
  wo.status <= '91' and
  op.complete_date >= '2006-01-01';

Method 2: Semi-join -- takes 242 seconds:

truncate table stage.open_work_order_operations;

insert into stage.open_work_order_operations
SELECT 
  op.*  
FROM 
  stage.work_order_operations op
where
  exists (
    select null
    from stage.work_order_master wo
    where
      wo.work_order = op.work_order  AND
      wo.status <= '91'
  ) and
  op.complete_date >= '2006-01-01';

Method 3: In-List subquery -- takes 216 seconds

truncate table stage.open_work_order_operations;

insert into stage.open_work_order_operations
SELECT 
  op.*  
FROM 
  stage.work_order_operations op
WHERE 
  op.work_order in (
      select work_order
      from stage.work_order_master 
      where status <= '91') and
  op.complete_date >= '2006-01-01';

And here is the interesting one. If I wrap this in the function and treat the list of work orders as an array, it completes in 166 seconds:

CREATE OR REPLACE FUNCTION stage.open_work_order_data()
  RETURNS integer AS
$BODY$
DECLARE
  rowcount integer := 0;
  work_orders text[];
BEGIN

  select array_agg(work_order)
  into work_orders
  from stage.work_order_master
  where status <= '91';

  truncate table stage.open_work_order_operations;

  insert into stage.open_work_order_operations
  select *
  from stage.work_order_operations op
  where op.work_order = any (work_orders)
  and complete_date >= '2006-01-01';

  GET DIAGNOSTICS rowcount = ROW_COUNT;

  return rowcount;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

Several have recommended to run an execution plan. I want to do this, especially on the last one, as I suspect it is using work_order_operations_pk, which should make a huge difference.

Also, the tables do have current stats on them. When I ran the explain plan originally listed, I was doing so with a subset of data, just to demonstrate my issue. The behavior on the subset and the entire dataset appear to be identical.

1) you use varchar/text columns as primary keys 2) you don't have any FK relation 3) do you have valid statistics? — wildplasser, Oct 15 '15 at 22:10
BTW: since the whole work_order query is never referenced in the outer query, it could be squeezez out into an EXIST() subquery, together with the <= '90' condition. Bat that wouldnt solve anything, IMO. — wildplasser, Oct 15 '15 at 22:34
Create an index on (status_code,work_order) (or the other way around). PG just didn't decide to use two indexes. — Jakub Kania, Oct 15 '15 at 23:21
@wildplasser -- I added the FK, but unfortunately it appears to yield the same result. Stats are updated. I'll reload the tables and see if it makes a difference. Dmitri - that field already is an integer; can you clarify? — Hambone, Oct 16 '15 at 00:20
@wildplasser, my actual query uses fields from both tables -- I simplified this to illustrate the issue. It didn't hurt to try, so I did a semi-join / exists to see if it would make a difference, and unfortunately it did not — Hambone, Oct 16 '15 at 02:07
@JakubKania, I'm pretty sure Pg can use more than one index, and I didn't think an index on the _master table would impact the decision to use an index on the other table. That said, I tried it anyway, and unfortunately there is no change. I can't help but think I'm missing something fundamental — Hambone, Oct 16 '15 at 02:09
Well, we cannot optimise your actual query if you don't show it. 1) why are your key elements (including status) all text fields? 2) What is the proportion of status = '90' tuples you expect to find ? 3) add the output of `EXPLAIN ANALYZE the_same_query;` to your question. (the above query looks pretty fast to me), given no FK constraint) — wildplasser, Oct 16 '15 at 10:03
@Hambone Yes it can use more than one, on the same table even. And yes, adding an index on any of the tables can change the plan to use or not use index on another. — Jakub Kania, Oct 16 '15 at 17:06
PostgreSQL thinks work_order_operations has 620 rows. That is way off from what you say it has. Have you run ANALYZE on your database, so that PostgreSQL has the stats it needs to make good decisions? — jjanes, Oct 16 '15 at 17:56

score 0 · Answer 1 · answered Oct 16 '15 at 04:16

So, you are filtering the wo table to 15,000 rows and then you want the server to do 15,000 seeks in the op table using its primary key instead of scanning the op table. Did I get it right?

You can try to rewrite the query to follow your preferred flow like this:

WITH
CTE
AS
(
    SELECT wo.work_order
    FROM stage.work_order_master AS wo
    WHERE wo.status_code <= '90'
)
SELECT T.*
FROM
    CTE
    INNER JOIN LATERAL
    (
        SELECT op.*
        FROM stage.work_order_operations AS op
        WHERE op.work_order = CTE.work_order
    ) AS T ON true

Postgres materializes CTE, so the first step of filtering the wo table should use the index work_order_master_ix1.

LATERAL join explicitly says that for each row from filtered wo table we want to find a row from op table. It produces the same result as simple INNER JOIN in this case, but this syntax may "hint" optimizer to do a seek in op table for each row from CTE instead of doing a hash join and scanning the op table.

Please try it on your data and let us know what execution plan you are getting.

Thanks so much for this solution and explanation. Unfortunately, it looks like the query still won't use `work_order_master_pk` and is scanning all 40,000,000 rows. I'm at a total loss. — Hambone, Oct 16 '15 at 17:26
@Hambone, why should it use `work_order_master_pk`? It must be using `work_order_master_ix1` to filter `wo.status_code <= '90'`. I'm confused. Please edit the question and state clearly how many rows each table has, how many rows satisfy the filter `wo.status_code <= '90'`, how many rows the whole query returns. And show us the **actual** execution plan when running on the full data set. Make sure you update all statistics. — Vladimir Baranov, Oct 17 '15 at 00:00

Explain Plan Fails to Use Index

1 Answers1