Is there any performance difference between those two SQL queries?

Question

I’m a new SQL learner and a newbie to StackOverflow. Hope I didn't miss anything important for a first-time post.

I happened to get two following queries from my instructor saying they have different performance. But I couldn’t see why they are different in terms of the logic and computation cost.

Query 1:

SELECT First_Name,
SUM(total_sales_amount) AS sub_total_sales_amount  FROM
(
select A.First_Name, C.product_quantity * D.Retail_Price AS t  otal_sales_amount From join_demo.customer as A
inner join join_demo.customer_order as B  on A.customer_id = B.customer_id
inner join join_demo.order_details C  on B.order_id = C.order_id
inner join join_demo.product as D  on C.product_id= D.product_id
) E
GROUP BY 1
ORDER BY sub_total_sales_amount DESC  LIMIT 1;

Query 2 (I was told this one has better performance):

SELECT A.First_Name,  SUM(C.product_quantity * D.Retail_Price)  AS  sub_total_sales_amount
From join_demo.customer as A
inner join join_demo.customer_order as B  on A.customer_id = B.customer_id
inner join join_demo.order_details C  on B.order_id = C.order_id
inner join join_demo.product as D  on C.product_id= D.product_id  GROUP BY 1
ORDER BY sub_total_sales_amount DESC  LIMIT 1;

I’m running MySQL on my local Mac. But I suppose this one would be a general question regarding to SQL performance tuning. Could someone please shed light on this question? Much appreciated!

Updated: Thanks @Tim and @MatBailie. I added EXPLAIN before each query. The results are exactly the same. I guess two queries are on the same level of performance.

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	A	NULL	ALL	NULL	NULL	NULL	NULL	3	100	Using temporary; Using filesort
1	SIMPLE	B	NULL	ALL	NULL	NULL	NULL	NULL	4	25	Using where; Using join buffer (hash join)
1	SIMPLE	C	NULL	ALL	NULL	NULL	NULL	NULL	5	20	Using where; Using join buffer (hash join)
1	SIMPLE	D	NULL	ALL	NULL	NULL	NULL	NULL	5	20	Using where; Using join buffer (hash join)

Why don't you ask your instructor? It's what they are there for. — HoneyBadger, Jan 25 '21 at 09:19
@Larnu - given "I’m running MySQL on my local Mac." why did you think [tag:sql-server] was the right tag to keep? — Damien_The_Unbeliever, Jan 25 '21 at 09:25
@Damien_The_Unbeliever my bad, must have clicked the wrong x. — Thom A, Jan 25 '21 at 09:26
I personally doubt there would be any difference. I suggest running `EXPLAIN` on each query, and editing your question to include the plans being generated. I'd bet my breakfast that they're the same. If they are the same, you'll be able to educate your instructor. — MatBailie, Jan 25 '21 at 09:32
@MatBailie, thanks Mat. You are absolutely right! Their execution plans are the same. Gordon (below answer) mentioned they might be different if running in old version SQL. So I guess for the course I took, they might not update their teaching materials to date. LOL. — cococ0j, Jan 26 '21 at 10:55
(unrelated) It seems like you should `GROUP BY customer_id` not `GROUP BY first_name`. — Rick James, Feb 08 '21 at 18:34

score 1 · Accepted Answer · answered Jan 25 '21 at 12:14

Old versions of MySQL used to automatically materialize derived tables (subqueries in the FROM clause). "Materialize" means that MySQL runs the subquery and saves the results in a temporary location (in this case, before doing the aggregation).

I think the optimizer was improved starting with version 5.7 (although the history may be wrong). Nowadays, MySQL is smarter about materialization and will generally merge a subquery with the outer query.

Hence, more recent versions of MySQL should produce the same execution plan. Of course, optimizers can be confused and the optimizer may decide to materialize the subquery, which would slow down the query under most circumstances.

You can read more about this in the documentation.

You should also learn to use meaningful table aliases, such as c for customers. And, qualify all column references so it is clear where the columns come from. Arbitrary letters are probably worse than no aliases at all (assuming the columns are all qualified).

Thank you, Gordon. I added `EXPLAIN` in the queries. The execution plans are the same. I guess in this particular scenario two queries are the same in terms of performance. And thank you for your advice at last. — cococ0j, Jan 26 '21 at 10:37

score 0 · Answer 2 · answered Jan 25 '21 at 09:23

The first query uses an explicit subquery to first generate an intermediate result containing, for each first name, every total amount. Then, it aggregates over name in the outer query to generate the sums you want. The second version does not use any such intermediate subquery, but instead directly aggregates on the joined tables. As a result, the first query may have extra overhead with regard to memory, and also performance, as MySQL has to aggregate over the intermediate table.

However, you should check the EXPLAIN plans of both queries to verify this. It is also possible that MySQL might be smart enough to execute the first query using the same plan as the second one.

Thank you, Tim. Yes, I followed your suggestion and added `EXPLAIN` at the beginning of two queries. The execution plans are the same. — cococ0j, Jan 26 '21 at 10:40

score 0 · Answer 3 · answered Feb 08 '21 at 18:36

0

Please provide SHOW CREATE TABLE.

It sounds like these indexes are missing:

B:  INDEX(customer_id)
C:  INDEX(order_id)
D:  INDEX(product_id)

answered Feb 08 '21 at 18:36

Rick James

135,179
13
127
222

Is there any performance difference between those two SQL queries?

3 Answers3