Is there any harm to having a duplicate index in Postgresql?

Question

I have the following structure.

CREATE TABLE join_table (
  id integer NOT NULL,
  col_a integer NOT NULL,
  col_b integer NOT NULL
)

CREATE INDEX index_on_col_a ON join_table USING btree (col_a);
CREATE INDEX index_on_col_b ON join_table USING btree (col_b);
CREATE UNIQUE INDEX index_on_col_a_and_col_b ON join_table USING btree (col_a, col_b);

There are also foreign keys on col_a and col_b.

Clearly index_on_col_a is no longer needed, but is there a cost or benefit to keeping or deleting it?

My guess is;

keeping it will slow down inserts
selects using just col_a may be faster if I keep it

hmm... should I avoid guessing in questions? maybe someone has something more firm than a guess. — Matthew Rudy, Mar 21 '12 at 09:29
It depends on the case, Better write performance or query perfor But from my personal opinions, we need drop index index_on_col_a — francs, Mar 21 '12 at 09:38
thanks @francs. I usually would. I just wanted to get some verification that I'm right. I guess I'll just remove it. — Matthew Rudy, Mar 21 '12 at 10:05
We have discussed this case [in great detail at dba.SE recently](http://dba.stackexchange.com/q/6115/3684). — Erwin Brandstetter, Mar 21 '12 at 11:16

score 10 · Accepted Answer · edited Mar 21 '12 at 12:59

You can drop the index on col_a. PostgreSQL is able to use the combined index if you query on col_a and is also able to use the index if you query on col_a and col_b. These query types can use the combined index:

WHERE col_a = 'val'
WHERE col_a = 'val' AND col_b = 'val'

The combined index cannot be used to query only col_b or an OR junction of col_a and col_b. So the additional index over col_b can make sense if you frequently have queries querying only col_b.

Edit: So: you don't have an advantage creating index_on_col_a, but you have a slower write speed. Drop it.

score 0 · Answer 2 · answered Jan 20 '23 at 11:33

Even though I agree with the other answer on dropping the index on col_a, sometimes index combinations could be so large that, the index on (col_a, col_b) takes more disk pages, compared to col_a index, which could lead to more I/O on disk. Please use EXPLAIN ANALYZE and EXPLAIN FORMAT=JSON to find the actual rows read, and total cost (represented with equivalent of I/O ops).

If there are more col_b per col_a (per 1 col_a, there are >100 col_b or so), then having the col_a will be helpful. if you are doing range queries, this will be more useful in that case. All these make sense if you really care about very low latency during reads.

Is there any harm to having a duplicate index in Postgresql?

2 Answers2