2

How to restructure this query:

SELECT * FROM tbl t
WHERE (
       t.id IN <subquery1>
    OR t.id IN <subquery2>
    OR t.id IN <subquery3>
)

... into something that looks more like the following:

SELECT * FROM tbl t
WHERE t.id IN (<subquery1> OR <subquery2> OR <subquery3>)

Note: all 3 subqueries select from the same tbl t, but they select a different column each.

To clarify the subqueries a bit further with some concrete examples:

  • subquery1: SELECT col1 FROM tbl WHERE value=100
  • subquery2: SELECT col2 FROM tbl WHERE value=200
  • subquery3: SELECT col3 FROM tbl WHERE value=300

Table structure:

CREATE TABLE tbl (
    id      INTEGER   PRIMARY KEY,
    col1    INTEGER   not null,
    col2    INTEGER   not null,
    col3    INTEGER   not null,
    value   INTEGER   not null
);
mae
  • 14,947
  • 8
  • 32
  • 47
  • *Note: all 3 subqueries select from the same table t, but they select a different column each.* If you use `t.id IN ` then each subquery returns only one column. What do you mean when you tell "different column"? – Akina Jan 26 '21 at 07:46
  • 3
    You may use `WHERE t.id IN ( UNION ALL UNION ALL )`, but I doubt that this will improve. – Akina Jan 26 '21 at 07:48
  • @Akina Indeed, they return 1 column each, so 3 subqueries return 3 different columns (col1, col2, col3) respectively. – mae Jan 26 '21 at 07:50
  • 1
    Upload your subquerys and your table structure and maybe we can help. I think you may rewrite your subqueries to have just one (or use UNION ALL) as @Akina says – nacho Jan 26 '21 at 07:52
  • The names of the columns means nothing. And I agree with @nacho - provide complete DDLs and precise queries texts. Also the query may be improved while rewrite to WHERE EXISTS. – Akina Jan 26 '21 at 07:53
  • Maybe `join` instead of `in`? – FanoFN Jan 26 '21 at 07:55
  • Does `table` in the query and in all 3 subqueries is the same table? – Akina Jan 26 '21 at 07:56
  • @fadlikidd Million join million 3 times == hang. – Akina Jan 26 '21 at 07:57
  • @Akina I think `UNION` might be better because you don't need to worry about duplicate and condiction is `or`. – T. Peter Jan 26 '21 at 07:58
  • @Akina , ah well that could be a problem.. but if the subqueries already limited to a few certain values, that might be different right? In any case, we all are curious about the subqueries – FanoFN Jan 26 '21 at 08:00
  • IMHO, `union` or `union all` , this need execution plan to decide which one is better, but I'll guess if subquery return few rows and main table have a lot of rows than `union` might be faster, on the contrary, `union all` – T. Peter Jan 26 '21 at 08:03
  • 1
    @T.Peter I doubt strongly. `WHERE IN (.. UNION ALL ..)` causes combining and sorting rowsets whereas `WHERE IN (.. UNION ALL ..)` additionally removes duplicates. Average compacting factor is 2 which gives one additional binary search step over sorted rowset. In general the profit of the compactization is less than the overhead of duplicate removing. From the other side UNION instead of UNION ALL may make sense when each separate subquery returns huge percent of duplicates and compacting factor is high enough (approx. over 20). – Akina Jan 26 '21 at 08:10
  • agree, subquery have to return huge amount of duplicate for actual impact. – T. Peter Jan 26 '21 at 08:13
  • Is there a reason that you want to restructure? How many rows in table , how many rows in each sub query.? – P.Salmon Jan 26 '21 at 08:29
  • What is the approximate amount of rows: in the table, returned by each subquery, returned by the query? *Table structure* And no indices? – Akina Jan 26 '21 at 08:29
  • @Akina The table has indices on `id`, `col1`, `col2`, `col3`. The size of the table is very large (millions of records), however the subqueries return a small amount (<10 of records). – mae Jan 26 '21 at 08:33
  • The index `(value, colX)` may improve. – Akina Jan 26 '21 at 08:35
  • ' very large (millions of records); - doesn't say much. 2 million is millions but not very large. – P.Salmon Jan 26 '21 at 08:46
  • @P.Salmon ~41 million roughly – mae Jan 26 '21 at 08:52

3 Answers3

5

A quick test on a table integers with only the field i (and 2621441 rows):

SELECT i 
FROM integers
WHERE (
   i in (SELECT i FROM integers WHERE i = 100)
   OR
   i in (SELECT i FROM integers WHERE i = 200)
   OR 
   i in (SELECT i FROM integers WHERE i = 1000)
)
ORDER BY i;

+----+-------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+-------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY     | integers | NULL       | index | NULL          | PRIMARY | 4       | NULL  | 2615753 |   100.00 | Using where; Using index |
|  4 | SUBQUERY    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  3 | SUBQUERY    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  2 | SUBQUERY    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
+----+-------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
4 rows in set, 1 warning (0.01 sec)

Above returns result in about 2 secs.

SELECT i 
FROM integers
WHERE i in (
   SELECT i FROM integers WHERE i = 100
   UNION ALL
   SELECT i FROM integers WHERE i = 200
   UNION ALL
   SELECT i FROM integers WHERE i = 1000
)
ORDER BY i;

+----+--------------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type        | table    | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+--------------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY            | integers | NULL       | index | NULL          | PRIMARY | 4       | NULL  | 2615753 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  3 | DEPENDENT UNION    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  4 | DEPENDENT UNION    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
+----+--------------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
4 rows in set, 1 warning (0.00 sec)

Above returns results in about 1.35 sec

SELECT i 
FROM integers
WHERE i in (
   SELECT i FROM integers WHERE i = 100
   UNION
   SELECT i FROM integers WHERE i = 200
   UNION 
   SELECT i FROM integers WHERE i = 1000
)
ORDER BY i;

+----+--------------------+--------------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type        | table        | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+--------------------+--------------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY            | integers     | NULL       | index | NULL          | PRIMARY | 4       | NULL  | 2615753 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | integers     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  3 | DEPENDENT UNION    | integers     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  4 | DEPENDENT UNION    | integers     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
| NULL | UNION RESULT       | <union2,3,4> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL  |    NULL |     NULL | Using temporary          |
+----+--------------------+--------------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.00 sec)

Above returns results in 1.6 secs.

The 'winner' is UNION ALL

Luuk
  • 12,245
  • 5
  • 22
  • 33
  • @mae The problem with this accepted answer is I have no idea which of the 2 choices is being accepted or why - ergo not very useful to forum Can you please give your reason for acceptance. – P.Salmon Jan 26 '21 at 09:18
4

I have tested a lot of variants (synthetic table, 10kk rows, colX = random in 1..10kk, value = random in 1..1kk). The most fast is:

CREATE INDEX idx ON test (value);
SELECT id
FROM test
WHERE id in (SELECT col1 FROM test WHERE value = 100)
UNION
SELECT id
FROM test
WHERE id in (SELECT col2 FROM test WHERE value = 200)
UNION
SELECT id
FROM test
WHERE id in (SELECT col3 FROM test WHERE value = 1000)
ORDER BY id;

mysql> SELECT id
    -> FROM test
    -> WHERE id in (SELECT col1 FROM test WHERE value = 100)
    -> UNION
    -> SELECT id
    -> FROM test
    -> WHERE id in (SELECT col2 FROM test WHERE value = 200)
    -> UNION
    -> SELECT id
    -> FROM test
    -> WHERE id in (SELECT col3 FROM test WHERE value = 1000)
    -> ORDER BY id;
-- <output skipped>
36 rows in set (1.60 sec)

mysql> SELECT id
    -> FROM test
    -> WHERE (
    ->    id in (SELECT col1 FROM test WHERE value = 100)
    ->    OR
    ->    id in (SELECT col2 FROM test WHERE value = 200)
    ->    OR
    ->    id in (SELECT col3 FROM test WHERE value = 1000)
    -> )
    -> ORDER BY id;
-- <output skipped>
36 rows in set (29.18 sec)
Akina
  • 39,301
  • 5
  • 14
  • 25
3

Actually it can be done with one subquery:

SELECT * 
FROM tbl t1
WHERE t1.id IN (
  SELECT  
    CASE t2.value
        WHEN 100 THEN t2.col1
        WHEN 200 THEN t2.col2
        WHEN 1000 THEN t2.col3
    END AS id
  FROM tbl t2
  WHERE t2.value IN (100, 200, 1000)
)

Tested on 2474003 rows with index v4 on value column:

+----+--------------+-------------+------------+--------+---------------+---------+---------+--------------------+------+----------+-----------------------+
| id | select_type  | table       | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                 |
+----+--------------+-------------+------------+--------+---------------+---------+---------+--------------------+------+----------+-----------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               | NULL |   100.00 | NULL                  |
|  1 | SIMPLE       | t1          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | <subquery2>.id     |    1 |   100.00 | Using where           |
|  2 | MATERIALIZED | t2          | NULL       | range  | v4            | v4      | 4       | NULL               |    7 |   100.00 | Using index condition |
+----+--------------+-------------+------------+--------+---------------+---------+---------+--------------------+------+----------+-----------------------+

Or without subqueries. If col1, col2, and col3 are a subset of id, then the subquery is the answer itself.

id'7238
  • 2,428
  • 1
  • 3
  • 11