Optimal explain plan

Question

In theory, which of these would return results faster? I'm having to deal with almost half a billion rows in table and coming up with a plan to remove quite a few. I need to ensure I'm providing the quickest possible solution.

+----+-------------+------------------+------+---------------+------+---------+------+-----------+---------------------------------+
| id | select_type | table            | type | possible_keys | key  | key_len | ref  | rows      | Extra                           |
+----+-------------+------------------+------+---------------+------+---------+------+-----------+---------------------------------+
|  1 | PRIMARY     | tableA    | ALL  | NULL          | NULL | NULL    | NULL | 505432976 | Using where                     |
|  2 | SUBQUERY    | tableA    | ALL  | NULL          | NULL | NULL    | NULL | 505432976 | Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+------+---------+------+-----------+---------------------------------+
2 rows in set (0.00 sec)

+----+-------------+------------------+--------+---------------------------------------------+---------+---------+-----------+-----------+---------------------------------+
| id | select_type | table            | type   | possible_keys                               | key     | key_len | ref       | rows      | Extra                           |
+----+-------------+------------------+--------+---------------------------------------------+---------+---------+-----------+-----------+---------------------------------+
|  1 | PRIMARY     | <derived2>       | ALL    | NULL                                        | NULL    | NULL    | NULL      | 505432976 | Using where                     |
|  1 | PRIMARY     | a1               | eq_ref | PRIMARY,FK_address_1,idx_address_1  | PRIMARY | 8       | t2.max_id |         1 | Using where                     |
|  2 | DERIVED     | tableA           | ALL    | NULL                                        | NULL    | NULL    | NULL      | 505432976 | Using temporary; Using filesort |
+----+-------------+------------------+--------+---------------------------------------------+---------+---------+-----------+-----------+---------------------------------+
3 rows in set (0.01 sec)

the query plan is just one small piece; if you want meaningful advice edit your question to show your queries themselves and output of `show create table yourtablename` for all tables and be willing to answer questions about cardinality etc — ysth, Jul 19 '21 at 23:01

Rick James · Answer 1 · 2021-07-20T16:42:36.637

Your question may be focused on "subquery" versus "derived table".

And your question is related to Deleting a large part of a table. Ignore my discussion of EXPLAIN and skip to my link below. That is, neither is "the quickest"!

Explaining the EXPLAINs

A very crude way to use the EXPLAIN is to multiple the Rows column. In the first query, that is (505432976 * 505432976). This tells me that the queries could take years, maybe centuries, to run. The query seems to say "For each row in 'primary', scan all of 'subquery'".

In the second ('DERIVED') query, multiple each "table", then "it depends" when it comes to whether to multiply or add the results. I think that "add" would happen -- (505432976 + 505432976). Bad, but not nearly as terrible. It seems to say "First copy all of the 'derived' tableA into a temp table, then scan all of that temp table to get the final results."

ALL means a "table scan", which may mean that there is no useful index. Or it may mean that you are deliberately looking at all rows of each 500M-row table.

Caveat: LIMITis usually not factored into the numbers in EXPLAIN. But sometimes LIMIT does not shorten the execution time.

Each table must have a PRIMARY KEY. Secondary indexes are often very useful. "Composite" indexes are often better than single-column indexes. Look at the WHERE clause for what column(s) should be indexed. (The art of indexing is much more complex than that, but this would get you started.)

See also EXPLAIN FORMAT=JSON SELECT ...

Show us the queries and tell us about what you need to delete (or "keep")!

Plan to remove quite a few

It may be much faster to copy over the rows you want to keep.

I discuss various techniques for deleting lots of rows. Reading that may save you a lot of grief with your 500M rows!

Optimal explain plan

1 Answers1