0

Say I have a table with 20 million rows I want to index like so:

CREATE INDEX fruit_color
  ON fruits
  USING btree
  (color);

Now let's say that only 2% of the fruits have a color, rest will be NULL. My queries will NEVER want to find fruits with color NULL (no color), so the question is, will it make a difference for postgresql if I change the index to:

 CREATE INDEX fruit_color
  ON fruits
  USING btree
  (color)
  WHERE color IS NOT NULL;

I don't know much about postgresql's internal way of handling indexes, so this is why I ask.

PS postgresql version is 9.2

Niels Kristian
  • 8,661
  • 11
  • 59
  • 117
  • My guess is that the performance of the index won't change much, because the portion of the BTree for colors will look similar in both cases. The size of the index might differ. Have you tried actually using both indices with `EXPLAIN`? – Tim Biegeleisen Nov 30 '16 at 14:08

1 Answers1

0

Yes, that will make a difference. How much of a difference depends on how the index is used.

If there is only one fruit with a certain color, and you search for this fruit by color, it won't make much of a difference; maybe one less page will be accessed (because the index has maybe one level of depth less).

If there are many fruits of a certain color, the improvement will be great, because it will be much cheaper to scan the whole index (for a bitmap index scan) or a bigger part of it (for a regular or index-only scan). If the index is big, PostgreSQL will be more reluctant to scan the complete index and will probably choose a sequential table scan instead.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • I implemented the two indexes, and I can say also the size of the index was very different. The partial one was a lot smaller! – Niels Kristian Dec 01 '16 at 10:32