SELECT with OR using two columns with index is very slow

Question

There is a table with the following definition in my Cockroach database:

CREATE TABLE foo_value (
    foo_id_a INT NOT NULL,
    foo_id_b INT NOT NULL,
    value FLOAT NULL,
    create_date_time TIMESTAMP NULL,
    update_date_time TIMESTAMP NULL,
    CONSTRAINT "primary" PRIMARY KEY (foo_id_a ASC, foo_id_b ASC),
    INDEX foo_value_foo_id_a_foo_id_b_idx (foo_id_a ASC, foo_id_b ASC),
    INDEX foo_id_a_idx (foo_id_a ASC),
    INDEX foo_id_b_idx (foo_id_b ASC),
    FAMILY "primary" (foo_id_a, foo_id_b, value, create_date_time, update_date_time)
)

It contains roughly 400000 rows.

Querying for one of the two IDs is fast:

SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456;
takes 0.071 s

SELECT * FROM foo_db.foo_value WHERE foo_id_b = 123456;
takes 0.086 s

However querying for one OR the other is very slow:

SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456 OR foo_id_b = 123456;
takes 2.739 s

Why is this?

The results from EXPLAIN look as follows:

EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483;
+-------+------+-------+-----------------------+
| Level | Type | Field |      Description      |
+-------+------+-------+-----------------------+
|     0 | scan |       |                       |
|     0 |      | table | foo_value@primary     |
|     0 |      | spans | /321210483-/321210484 |
+-------+------+-------+-----------------------+


EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_b = 321210483;
+-------+------------+-------+------------------------+
| Level |    Type    | Field |      Description       |
+-------+------------+-------+------------------------+
|     0 | index-join |       |                        |
|     1 | scan       |       |                        |
|     1 |            | table | foo_value@foo_id_b_idx |
|     1 |            | spans | /321210483-/321210484  |
|     1 | scan       |       |                        |
|     1 |            | table | foo_value@primary      |
+-------+------------+-------+------------------------+


EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483 OR foo_id_b = 321210483;
+-------+------+-------+-------------------+
| Level | Type | Field |    Description    |
+-------+------+-------+-------------------+
|     0 | scan |       |                   |
|     0 |      | table | foo_value@primary |
|     0 |      | spans | ALL               |
+-------+------+-------+-------------------+

What does the proposed and actual execution plans look like? — Simon Price, Apr 23 '18 at 11:32

score 7 · Accepted Answer · answered Apr 23 '18 at 11:33

What you are asking for is an index optimization that uses two different indexes in the or. Alas, SQL engines do not in general support this optimization (although Oracle does as might some other databases).

You are best off using union all:

SELECT *
FROM foo_db.foo_value
WHERE foo_id_a = 123456;
UNION ALL
SELECT *
FROM foo_db.foo_value
WHERE foo_id_b = 123456 AND foo_id_a <> 123456;

(Note: You might need to take NULL values into account for the second WHERE clause.)

Each of the subqueries will be correctly optimized using the index.

why "union all" is used and not just "union"? I don't think he wants to get duplicates. — user1770924, Nov 06 '22 at 12:57

SELECT with OR using two columns with index is very slow

1 Answers1