3

There is a table with the following definition in my Cockroach database:

CREATE TABLE foo_value (
    foo_id_a INT NOT NULL,
    foo_id_b INT NOT NULL,
    value FLOAT NULL,
    create_date_time TIMESTAMP NULL,
    update_date_time TIMESTAMP NULL,
    CONSTRAINT "primary" PRIMARY KEY (foo_id_a ASC, foo_id_b ASC),
    INDEX foo_value_foo_id_a_foo_id_b_idx (foo_id_a ASC, foo_id_b ASC),
    INDEX foo_id_a_idx (foo_id_a ASC),
    INDEX foo_id_b_idx (foo_id_b ASC),
    FAMILY "primary" (foo_id_a, foo_id_b, value, create_date_time, update_date_time)
)

It contains roughly 400000 rows.

Querying for one of the two IDs is fast:

SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456;
takes 0.071 s

SELECT * FROM foo_db.foo_value WHERE foo_id_b = 123456;
takes 0.086 s

However querying for one OR the other is very slow:

SELECT * FROM foo_db.foo_value WHERE foo_id_a = 123456 OR foo_id_b = 123456;
takes 2.739 s

Why is this?

The results from EXPLAIN look as follows:

EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483;
+-------+------+-------+-----------------------+
| Level | Type | Field |      Description      |
+-------+------+-------+-----------------------+
|     0 | scan |       |                       |
|     0 |      | table | foo_value@primary     |
|     0 |      | spans | /321210483-/321210484 |
+-------+------+-------+-----------------------+


EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_b = 321210483;
+-------+------------+-------+------------------------+
| Level |    Type    | Field |      Description       |
+-------+------------+-------+------------------------+
|     0 | index-join |       |                        |
|     1 | scan       |       |                        |
|     1 |            | table | foo_value@foo_id_b_idx |
|     1 |            | spans | /321210483-/321210484  |
|     1 | scan       |       |                        |
|     1 |            | table | foo_value@primary      |
+-------+------------+-------+------------------------+


EXPLAIN SELECT * FROM foo_db.foo_value WHERE foo_id_a = 321210483 OR foo_id_b = 321210483;
+-------+------+-------+-------------------+
| Level | Type | Field |    Description    |
+-------+------+-------+-------------------+
|     0 | scan |       |                   |
|     0 |      | table | foo_value@primary |
|     0 |      | spans | ALL               |
+-------+------+-------+-------------------+
Tobias Hermann
  • 9,936
  • 6
  • 61
  • 134

1 Answers1

7

What you are asking for is an index optimization that uses two different indexes in the or. Alas, SQL engines do not in general support this optimization (although Oracle does as might some other databases).

You are best off using union all:

SELECT *
FROM foo_db.foo_value
WHERE foo_id_a = 123456;
UNION ALL
SELECT *
FROM foo_db.foo_value
WHERE foo_id_b = 123456 AND foo_id_a <> 123456;

(Note: You might need to take NULL values into account for the second WHERE clause.)

Each of the subqueries will be correctly optimized using the index.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786