Why is this simple join query significantly quicker with a sub-query?

Question

I have two tables. order_details which is 100,000 rows, and outbound which is 10,000 rows.

I need to join them on a column called order_number, which is a VARCHAR(50) on both. order_number is not unique in the outbound table.

CREATE TABLE `outbound` (
    `outbound_id` int(12) NOT NULL,
    `order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `order_details` (
    `order_details_id` int(12) NOT NULL,
    `order_number` varchar(50) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This is my initial query, and it takes well over 60 seconds to run:

SELECT o.order_number
FROM outbound o
INNER JOIN order_details od
    ON o.order_number = od.order_number

This query gets the same results and takes less than a second to run:

SELECT o.order_number
FROM outbound o
INNER JOIN
(
    SELECT order_number
    FROM order_details
) od
ON (o.order_number = od.order_number)

This is surprising to me because usually sub-queries are significantly slower.

Running EXPLAIN (which I'm still learning how to understand) shows that the sub query version uses a derived2 table, that it is using an index, and that index is auto_key0. I'm not savvy enough to know how to interpret this to understand why this makes a significant difference.

I am running these queries over command line.

I am running MySQL Ver 14.14 Distrib 5.6.35, for Linux (x86_64) CentOS.

In summary:

Why is this simple join query significantly quicker with a sub-query?

MySQL's bad optimizer? Did you compare with an `EXISTS` or `IN`, too? `SELECT o.order_number FROM outbound o WHERE EXISTS( SELECT order_number FROM order_details AS od WHERE o.order_number = od.order_number)` or `SELECT o.order_number FROM outbound o WHERE order_number IN ( SELECT order_number FROM order_details )` — dnoeth, Jul 28 '17 at 13:33
@dnoeth that first query takes over a minute, that second query is instant. — Goose, Jul 28 '17 at 13:42
As I said, a decent optimizer should treat all four similar (in fact the joins might get a different result when `order_details.orde_number` is not unique). — dnoeth, Jul 28 '17 at 13:52
It only matters in the `order_details` table, i.e. *many:one* vs. *one:many*. To avoid duplicate values you might have to add a `DISTINCT` to the joins. — dnoeth, Jul 28 '17 at 13:58
https://stackoverflow.com/questions/16385692/how-to-make-join-query-use-index helps maybe? — LONG, Jul 28 '17 at 14:01

Egl · Accepted Answer · 2017-07-28T13:50:43.280

My knowledge of MySQL is very limited. But these are my thoughts:

Your tables don't have indexes. Then the join has to read the entire second table in order to compare, for each row of the first table.

The subquery reads the second table once and creates an index, then it doesn't need to read the entire second table for each row of the first table. It only has to check the index, which is much more faster.

To verify if I'm ritght or not, try creating indexes for the column order_number in your two tables (CREATE INDEX ... ), and run again this two queries. Your first query should only take less than a second instead of a minute.

Was a pain to pull the data down to my dev to test this, but when I did I found that your answer is accurate. Thanks for the answer, makes sense. — Goose, Jul 28 '17 at 14:19

Why is this simple join query significantly quicker with a sub-query?

1 Answers1

Linked