1

We have a table with 4 million records, and for a particular frequently used use-case we are only interested in records with a particular salesforce userType of 'Standard' which are only about 10,000 out of 4 million. The other usertype's that could exist are 'PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess' and 'CsnOnly'.

So for this use case I thought creating a partial index would be better, as per the documentation.

So I am planning to create this partial index to speed up queries for records with a usertype of 'Standard' and prevent the request from the web from getting timed out:

CREATE INDEX user_type_idx ON user_table(userType)
WHERE userType NOT IN
   ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');

The lookup query will be

select * from user_table where userType='Standard';

Could you please confirm if this is the right way to create the partial index? It would of great help.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
user2359997
  • 561
  • 1
  • 16
  • 40

2 Answers2

4

Postgres can use that but it does so in a way that is (slightly) less efficient than an index specifying where user_type = 'Standard'.

I created a small test table with 4 million rows, 10.000 of them having the user_type 'Standard'. The other values were randomly distributed using the following script:

create table user_table
(
  id serial primary key,
  some_date date not null,
  user_type text not null,
  some_ts timestamp not null, 
  some_number integer not null, 
  some_data text, 
  some_flag boolean
);

insert into user_table (some_date, user_type, some_ts, some_number, some_data, some_flag)
select current_date, 
       case (random() * 4 + 1)::int
         when 1 then 'PowerPartner'
         when 2 then 'CSPLitePortal'
         when 3 then 'CustomerSuccess'
         when 4 then 'PowerCustomerSuccess'
         when 5 then 'CsnOnly'
       end, 
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,4e6 - 10000) as t(i)
union all 
select current_date, 
       'Standard',
       clock_timestamp(),
       42, 
       rpad(md5(random()::text), (random() * 200 + 1)::int, md5(random()::text)), 
       (random() + 1)::int = 1
from generate_series(1,10000) as t(i);

(I create tables that have more than just a few columns as the planner's choices are also driven by the size and width of the tables)

The first test using the index with NOT IN:

create index ix_not_in on user_table(user_type) 
where user_type not in ('PowerPartner', 'CSPLitePortal', 'CustomerSuccess', 'PowerCustomerSuccess', 'CsnOnly');
explain (analyze true, verbose true, buffers true) 
select *
from user_table
where user_type = 'Standard'

Results in:

QUERY PLAN                                                                                                                      
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on stuff.user_table  (cost=139.68..14631.83 rows=11598 width=139) (actual time=1.035..2.171 rows=10000 loops=1)
  Output: id, some_date, user_type, some_ts, some_number, some_data, some_flag                                                  
  Recheck Cond: (user_table.user_type = 'Standard'::text)                                                                       
  Buffers: shared hit=262                                                                                                       
  ->  Bitmap Index Scan on ix_not_in  (cost=0.00..136.79 rows=11598 width=0) (actual time=1.007..1.007 rows=10000 loops=1)      
        Index Cond: (user_table.user_type = 'Standard'::text)                                                                   
        Buffers: shared hit=40                                                                                                  
Total runtime: 2.506 ms                                                                                                         

(The above is a typical execution time after I ran the statement about 10 times to eliminate caching issues)

As you can see the planner uses a Bitmap Index Scan which is a "lossy" scan that needs an extra step to filter out false positives.

When using the following index:

create index ix_standard on user_table(id) 
where user_type = 'Standard';

This results in the following plan:

QUERY PLAN                                                                                                                              
----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_standard on stuff.user_table  (cost=0.29..443.16 rows=10267 width=139) (actual time=0.011..1.498 rows=10000 loops=1)
  Output: id, some_date, user_type, some_ts, some_number, some_data, some_flag                                                          
  Buffers: shared hit=313                                                                                                               
Total runtime: 1.815 ms                                                                                                                 

Conclusion:

Your index is used but an index on only the type that you are interested in is a bit more efficient.

The runtime is not that much different. I executed each explain about 10 times, and the average for the ix_standard index was slightly below 2ms and the average of the ix_not_in index was slightly above 2ms - so not a real performance difference.

But in general the Index Scan will scale better with increasing table sizes than the Bitmap Index Scan will do. This is basically due to the "Recheck Condition" - especially if not enough work_mem is available to keep the bitmap in memory (for larger tables).

  • Thank you very much for your detailed response! It made me understand a lot of thing's – user2359997 Oct 16 '18 at 06:32
  • @user2359997: I hope this also shows you how to test things like that for yourself in the future. The key point to analyzing performance is `explain (analyze)` –  Oct 16 '18 at 06:33
  • yes it does! thanks again for your valuable guidance! – user2359997 Oct 16 '18 at 06:36
1

For the index to be used, the WHERE condition must be used in the query as you wrote it.

PostgreSQL has some ability to make deductions, but it won't be able to infer that userType = 'Standard' is equivalent to the condition in the index.

Use EXPLAIN to find out if your index can be used.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263