0

I’m a new SQL learner and a newbie to StackOverflow. Hope I didn't miss anything important for a first-time post.

I happened to get two following queries from my instructor saying they have different performance. But I couldn’t see why they are different in terms of the logic and computation cost.

Query 1:

SELECT First_Name,
SUM(total_sales_amount) AS sub_total_sales_amount  FROM
(
select A.First_Name, C.product_quantity * D.Retail_Price AS t  otal_sales_amount From join_demo.customer as A
inner join join_demo.customer_order as B  on A.customer_id = B.customer_id
inner join join_demo.order_details C  on B.order_id = C.order_id
inner join join_demo.product as D  on C.product_id= D.product_id
) E
GROUP BY 1
ORDER BY sub_total_sales_amount DESC  LIMIT 1;

Query 2 (I was told this one has better performance):

SELECT A.First_Name,  SUM(C.product_quantity * D.Retail_Price)  AS  sub_total_sales_amount
From join_demo.customer as A
inner join join_demo.customer_order as B  on A.customer_id = B.customer_id
inner join join_demo.order_details C  on B.order_id = C.order_id
inner join join_demo.product as D  on C.product_id= D.product_id  GROUP BY 1
ORDER BY sub_total_sales_amount DESC  LIMIT 1;

I’m running MySQL on my local Mac. But I suppose this one would be a general question regarding to SQL performance tuning. Could someone please shed light on this question? Much appreciated!

Updated: Thanks @Tim and @MatBailie. I added EXPLAIN before each query. The results are exactly the same. I guess two queries are on the same level of performance.

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE A NULL ALL NULL NULL NULL NULL 3 100 Using temporary; Using filesort
1 SIMPLE B NULL ALL NULL NULL NULL NULL 4 25 Using where; Using join buffer (hash join)
1 SIMPLE C NULL ALL NULL NULL NULL NULL 5 20 Using where; Using join buffer (hash join)
1 SIMPLE D NULL ALL NULL NULL NULL NULL 5 20 Using where; Using join buffer (hash join)
cococ0j
  • 11
  • 3
  • 1
    Why don't you ask your instructor? It's what they are there for. – HoneyBadger Jan 25 '21 at 09:19
  • @Larnu - given "I’m running MySQL on my local Mac." why did you think [tag:sql-server] was the right tag to keep? – Damien_The_Unbeliever Jan 25 '21 at 09:25
  • @Damien_The_Unbeliever my bad, must have clicked the wrong x. – Thom A Jan 25 '21 at 09:26
  • I personally doubt there would be any difference. I suggest running `EXPLAIN` on each query, and editing your question to include the plans being generated. I'd bet my breakfast that they're the same. If they are the same, you'll be able to educate your instructor. – MatBailie Jan 25 '21 at 09:32
  • @MatBailie, thanks Mat. You are absolutely right! Their execution plans are the same. Gordon (below answer) mentioned they might be different if running in old version SQL. So I guess for the course I took, they might not update their teaching materials to date. LOL. – cococ0j Jan 26 '21 at 10:55
  • (unrelated) It seems like you should `GROUP BY customer_id` not `GROUP BY first_name`. – Rick James Feb 08 '21 at 18:34

3 Answers3

1

Old versions of MySQL used to automatically materialize derived tables (subqueries in the FROM clause). "Materialize" means that MySQL runs the subquery and saves the results in a temporary location (in this case, before doing the aggregation).

I think the optimizer was improved starting with version 5.7 (although the history may be wrong). Nowadays, MySQL is smarter about materialization and will generally merge a subquery with the outer query.

Hence, more recent versions of MySQL should produce the same execution plan. Of course, optimizers can be confused and the optimizer may decide to materialize the subquery, which would slow down the query under most circumstances.

You can read more about this in the documentation.

You should also learn to use meaningful table aliases, such as c for customers. And, qualify all column references so it is clear where the columns come from. Arbitrary letters are probably worse than no aliases at all (assuming the columns are all qualified).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you, Gordon. I added `EXPLAIN` in the queries. The execution plans are the same. I guess in this particular scenario two queries are the same in terms of performance. And thank you for your advice at last. – cococ0j Jan 26 '21 at 10:37
0

The first query uses an explicit subquery to first generate an intermediate result containing, for each first name, every total amount. Then, it aggregates over name in the outer query to generate the sums you want. The second version does not use any such intermediate subquery, but instead directly aggregates on the joined tables. As a result, the first query may have extra overhead with regard to memory, and also performance, as MySQL has to aggregate over the intermediate table.

However, you should check the EXPLAIN plans of both queries to verify this. It is also possible that MySQL might be smart enough to execute the first query using the same plan as the second one.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank you, Tim. Yes, I followed your suggestion and added `EXPLAIN` at the beginning of two queries. The execution plans are the same. – cococ0j Jan 26 '21 at 10:40
0

Please provide SHOW CREATE TABLE.

It sounds like these indexes are missing:

B:  INDEX(customer_id)
C:  INDEX(order_id)
D:  INDEX(product_id)
Rick James
  • 135,179
  • 13
  • 127
  • 222