MySQL: WHERE IN any of subqueries

Question

How to restructure this query:

SELECT * FROM tbl t
WHERE (
       t.id IN <subquery1>
    OR t.id IN <subquery2>
    OR t.id IN <subquery3>
)

... into something that looks more like the following:

SELECT * FROM tbl t
WHERE t.id IN (<subquery1> OR <subquery2> OR <subquery3>)

Note: all 3 subqueries select from the same tbl t, but they select a different column each.

To clarify the subqueries a bit further with some concrete examples:

subquery1: SELECT col1 FROM tbl WHERE value=100
subquery2: SELECT col2 FROM tbl WHERE value=200
subquery3: SELECT col3 FROM tbl WHERE value=300

Table structure:

CREATE TABLE tbl (
    id      INTEGER   PRIMARY KEY,
    col1    INTEGER   not null,
    col2    INTEGER   not null,
    col3    INTEGER   not null,
    value   INTEGER   not null
);

*Note: all 3 subqueries select from the same table t, but they select a different column each.* If you use `t.id IN ` then each subquery returns only one column. What do you mean when you tell "different column"? — Akina, Jan 26 '21 at 07:46
You may use `WHERE t.id IN ( UNION ALL UNION ALL )`, but I doubt that this will improve. — Akina, Jan 26 '21 at 07:48
@Akina Indeed, they return 1 column each, so 3 subqueries return 3 different columns (col1, col2, col3) respectively. — mae, Jan 26 '21 at 07:50
Upload your subquerys and your table structure and maybe we can help. I think you may rewrite your subqueries to have just one (or use UNION ALL) as @Akina says — nacho, Jan 26 '21 at 07:52
The names of the columns means nothing. And I agree with @nacho - provide complete DDLs and precise queries texts. Also the query may be improved while rewrite to WHERE EXISTS. — Akina, Jan 26 '21 at 07:53
Does `table` in the query and in all 3 subqueries is the same table? — Akina, Jan 26 '21 at 07:56
@Akina I think `UNION` might be better because you don't need to worry about duplicate and condiction is `or`. — T. Peter, Jan 26 '21 at 07:58
@Akina , ah well that could be a problem.. but if the subqueries already limited to a few certain values, that might be different right? In any case, we all are curious about the subqueries — FanoFN, Jan 26 '21 at 08:00
IMHO, `union` or `union all` , this need execution plan to decide which one is better, but I'll guess if subquery return few rows and main table have a lot of rows than `union` might be faster, on the contrary, `union all` — T. Peter, Jan 26 '21 at 08:03
@T.Peter I doubt strongly. `WHERE IN (.. UNION ALL ..)` causes combining and sorting rowsets whereas `WHERE IN (.. UNION ALL ..)` additionally removes duplicates. Average compacting factor is 2 which gives one additional binary search step over sorted rowset. In general the profit of the compactization is less than the overhead of duplicate removing. From the other side UNION instead of UNION ALL may make sense when each separate subquery returns huge percent of duplicates and compacting factor is high enough (approx. over 20). — Akina, Jan 26 '21 at 08:10
agree, subquery have to return huge amount of duplicate for actual impact. — T. Peter, Jan 26 '21 at 08:13
Is there a reason that you want to restructure? How many rows in table , how many rows in each sub query.? — P.Salmon, Jan 26 '21 at 08:29
What is the approximate amount of rows: in the table, returned by each subquery, returned by the query? *Table structure* And no indices? — Akina, Jan 26 '21 at 08:29
@Akina The table has indices on `id`, `col1`, `col2`, `col3`. The size of the table is very large (millions of records), however the subqueries return a small amount (<10 of records). — mae, Jan 26 '21 at 08:33
' very large (millions of records); - doesn't say much. 2 million is millions but not very large. — P.Salmon, Jan 26 '21 at 08:46

score 5 · Answer 1 · answered Jan 26 '21 at 08:17

A quick test on a table integers with only the field i (and 2621441 rows):

SELECT i 
FROM integers
WHERE (
   i in (SELECT i FROM integers WHERE i = 100)
   OR
   i in (SELECT i FROM integers WHERE i = 200)
   OR 
   i in (SELECT i FROM integers WHERE i = 1000)
)
ORDER BY i;

+----+-------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+-------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY     | integers | NULL       | index | NULL          | PRIMARY | 4       | NULL  | 2615753 |   100.00 | Using where; Using index |
|  4 | SUBQUERY    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  3 | SUBQUERY    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  2 | SUBQUERY    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
+----+-------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
4 rows in set, 1 warning (0.01 sec)

Above returns result in about 2 secs.

SELECT i 
FROM integers
WHERE i in (
   SELECT i FROM integers WHERE i = 100
   UNION ALL
   SELECT i FROM integers WHERE i = 200
   UNION ALL
   SELECT i FROM integers WHERE i = 1000
)
ORDER BY i;

+----+--------------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type        | table    | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+--------------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY            | integers | NULL       | index | NULL          | PRIMARY | 4       | NULL  | 2615753 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  3 | DEPENDENT UNION    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  4 | DEPENDENT UNION    | integers | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
+----+--------------------+----------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
4 rows in set, 1 warning (0.00 sec)

Above returns results in about 1.35 sec

SELECT i 
FROM integers
WHERE i in (
   SELECT i FROM integers WHERE i = 100
   UNION
   SELECT i FROM integers WHERE i = 200
   UNION 
   SELECT i FROM integers WHERE i = 1000
)
ORDER BY i;

+----+--------------------+--------------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
| id | select_type        | table        | partitions | type  | possible_keys | key     | key_len | ref   | rows    | filtered | Extra                    |
+----+--------------------+--------------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
|  1 | PRIMARY            | integers     | NULL       | index | NULL          | PRIMARY | 4       | NULL  | 2615753 |   100.00 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | integers     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  3 | DEPENDENT UNION    | integers     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
|  4 | DEPENDENT UNION    | integers     | NULL       | const | PRIMARY       | PRIMARY | 4       | const |       1 |   100.00 | Using index              |
| NULL | UNION RESULT       | <union2,3,4> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL  |    NULL |     NULL | Using temporary          |
+----+--------------------+--------------+------------+-------+---------------+---------+---------+-------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.00 sec)

Above returns results in 1.6 secs.

The 'winner' is UNION ALL

@mae The problem with this accepted answer is I have no idea which of the 2 choices is being accepted or why - ergo not very useful to forum Can you please give your reason for acceptance. — P.Salmon, Jan 26 '21 at 09:18

Akina · Accepted Answer · 2021-01-26T10:29:55.703

I have tested a lot of variants (synthetic table, 10kk rows, colX = random in 1..10kk, value = random in 1..1kk). The most fast is:

CREATE INDEX idx ON test (value);

SELECT id
FROM test
WHERE id in (SELECT col1 FROM test WHERE value = 100)
UNION
SELECT id
FROM test
WHERE id in (SELECT col2 FROM test WHERE value = 200)
UNION
SELECT id
FROM test
WHERE id in (SELECT col3 FROM test WHERE value = 1000)
ORDER BY id;

mysql> SELECT id
    -> FROM test
    -> WHERE id in (SELECT col1 FROM test WHERE value = 100)
    -> UNION
    -> SELECT id
    -> FROM test
    -> WHERE id in (SELECT col2 FROM test WHERE value = 200)
    -> UNION
    -> SELECT id
    -> FROM test
    -> WHERE id in (SELECT col3 FROM test WHERE value = 1000)
    -> ORDER BY id;
-- <output skipped>
36 rows in set (1.60 sec)

mysql> SELECT id
    -> FROM test
    -> WHERE (
    ->    id in (SELECT col1 FROM test WHERE value = 100)
    ->    OR
    ->    id in (SELECT col2 FROM test WHERE value = 200)
    ->    OR
    ->    id in (SELECT col3 FROM test WHERE value = 1000)
    -> )
    -> ORDER BY id;
-- <output skipped>
36 rows in set (29.18 sec)

This query is a better representation of the problem asked by the OP. (mae). — Luuk, Jan 26 '21 at 10:58

id'7238 · Answer 3 · 2021-01-26T11:13:48.340

Actually it can be done with one subquery:

SELECT * 
FROM tbl t1
WHERE t1.id IN (
  SELECT  
    CASE t2.value
        WHEN 100 THEN t2.col1
        WHEN 200 THEN t2.col2
        WHEN 1000 THEN t2.col3
    END AS id
  FROM tbl t2
  WHERE t2.value IN (100, 200, 1000)
)

Tested on 2474003 rows with index v4 on value column:

+----+--------------+-------------+------------+--------+---------------+---------+---------+--------------------+------+----------+-----------------------+
| id | select_type  | table       | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                 |
+----+--------------+-------------+------------+--------+---------------+---------+---------+--------------------+------+----------+-----------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               | NULL |   100.00 | NULL                  |
|  1 | SIMPLE       | t1          | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | <subquery2>.id     |    1 |   100.00 | Using where           |
|  2 | MATERIALIZED | t2          | NULL       | range  | v4            | v4      | 4       | NULL               |    7 |   100.00 | Using index condition |
+----+--------------+-------------+------------+--------+---------------+---------+---------+--------------------+------+----------+-----------------------+

Or without subqueries. If col1, col2, and col3 are a subset of id, then the subquery is the answer itself.

MySQL: WHERE IN any of subqueries

3 Answers3