2

Setup

A table with one jsonb column attributes and a non unique numeric ID campaignid:

  CREATE TABLE coupons (
    id integer NOT NULL,
    created timestamp with time zone DEFAULT now( ) NOT NULL,
    campaignid bigint NOT NULL,
    attributes jsonb NOT NULL
  );

This table would have up to 500M rows, arbitrary key/values in attributes and hundreds of different campaignid values.

Two indexes exist on the table:

CREATE INDEX campaignid_attrs_idx ON coupons
USING gin (campaignid,attributes);

CREATE INDEX campaignid_idx ON coupons
USING btree (campaignid, deleted);

What I did

I executed the query:

SELECT COUNT(*)
FROM coupons
WHERE 
(campaignid = 97 AND
   attributes @> '{"CountryId": 3}');

Expected Results

I expected the index campaignid_attrs_idx on (campaignid,attributes) to be fully used and the query to complete quite fast.

Actual Result

The query took a long time (~40 seconds) to execute.

Here's the output from explain (ANALYZE, COSTS):

Aggregate  (cost=32337.78..32337.79 rows=1 width=8) (actual time=39726.410..39726.414 rows=1 loops=1)
  ->  Bitmap Heap Scan on coupons  (cost=30164.40..32332.44 rows=2136 width=0) (actual time=16893.439..39549.891 rows=1088478 loops=1)
"        Recheck Cond: ((attributes @> '{""CountryId"": 3}'::jsonb) AND (campaignid = 97))"
        Rows Removed by Index Recheck: 10531586
        Heap Blocks: exact=138344 lossy=583282
        ->  BitmapAnd  (cost=30164.40..30164.40 rows=2136 width=0) (actual time=16837.885..16837.887 rows=0 loops=1)
              ->  Bitmap Index Scan on coupons_campaignid_attrs_index  (cost=0.00..1465.15 rows=178954 width=0) (actual time=9872.279..9872.279 rows=81799565 loops=1)
"                    Index Cond: (attributes @> '{""CountryId"": 3}'::jsonb)"
              ->  Bitmap Index Scan on campaignid_idx  (cost=0.00..28697.93 rows=2135515 width=0) (actual time=6454.972..6454.972 rows=3088167 loops=1)
                    Index Cond: (campaignid = 97)
Planning Time: 0.175 ms
Execution Time: 39726.480 ms

Conclusions

It seems like the index campaignid_attrs_idx was used for the first part of the query attributes @> '{"CountryId": 3}' returning ~80M rows, while the index campaignid_idx was used on the second part of the WHERE clause campaignid = 97 in parallel returning ~3M rows. Results from both parts were intersected to arrive at a set that fulfills both conditions. Then there was a Bitmap Heap Scan which verified that the result set complies with the desired conditions which took most of the time (16893.439..39549.891)


My main question, why wasn't campaignid_attrs_idx used to filter both conditions?



EDIT: I removed the second index campaignid_attrs_idx to see if then the multicolumn index will be used for both conditions. Strangely I still see that the only one of the conditions used in the index scan. Here's the plan:

Aggregate  (cost=181951.27..181951.28 rows=1 width=8) (actual time=209633.017..209633.018 rows=1 loops=1)
  ->  Bitmap Heap Scan on coupons  (cost=1424.30..181945.81 rows=2183 width=0) (actual time=8938.605..209401.433 rows=1091580 loops=1)
"        Recheck Cond: (attributes @> '{""CountryId"": 3}'::jsonb)"
        Rows Removed by Index Recheck: 31487517
        Filter: (campaignid = 97)
        Rows Removed by Filter: 80674951
        Heap Blocks: exact=121875 lossy=5572599
        ->  Bitmap Index Scan on coupons_campaignid_attributes_idx  (cost=0.00..1423.75 rows=179434 width=0) (actual time=8908.682..8908.682 rows=81802589 loops=1)
"              Index Cond: (attributes @> '{""CountryId"": 3}'::jsonb)"
Planning Time: 6.885 ms
Execution Time: 209638.234 ms
Alechko
  • 1,406
  • 1
  • 13
  • 27

0 Answers0