2

I am fairly new to MySQL and have this theoretical problem given to me. I am given these tables

customers
---------------
id
name
country
order_date


orders
---------------
id
order_number
order_type

customers_order_details
---------------
id
customer_id
order_id
price

A customer can have multiple different orders. I need to retrieve the customers with the largest total price spent, with the total price must be at least 100. Is my approach correct?

SELECT c.id, c.name AS customer_name, c.country , SUM(d.price) AS total_price
FROM customers c
JOIN customers_order_details d 
    ON c.id = d.customer_id
GROUP BY customer_name, 
HAVING total_price >= 100
ORDER BY total_price DESC;

I ask due to not sure since I was told for GROUP BY that I needed to add all columns specified but feel that using the name is more than adequate

GMB
  • 216,147
  • 25
  • 84
  • 135
user_2.177
  • 21
  • 2

2 Answers2

1

It looks almost correct.

Grouping by only customers.name isn't right though. Besides that this will throw an error on more tightly configured MySQL servers or newer versions or even DBMS from other vendors, what happens if there are two or more different customers with the same name, say some "John Smith"s? They're all aggregated in the same group giving false figures!

The safest bet is just to group by all columns not being an argument to an aggregation function. That would be customers.id, customers.name and customers.country in this case. In some DBMS you can also group by just a tuple of columns all the columns not given to an aggregation function are dependent of. If customers.id is declared as primary key, that would fulfill that rule and you could just group by it. But I'm not really sure if MySQL does implement that shortcut or in which versions or configurations. So you should better go with all the columns here.

Side note: The schema design is a little weird. Why are the order details directly linked to customers and not the orders themselves are linked to the customers? As it is now an order can have multiple details belonging to different customers. That may be right in your use case, but it's not the usual thing you would expect. Maybe you should revise that.

sticky bit
  • 36,626
  • 12
  • 31
  • 42
  • Thank you for your advice! In terms of what you stated, `customer.id` is a primary key within the `customers` table but are foreign keys in the `customer_order_details`. Would using `customer.id` still be suffiecient? – user_2.177 Dec 24 '20 at 19:05
  • @user_2.177: All the columns you select outside of aggregation functions are depending on `id` then as they're all from `customers`. Like I said, if your DBMS implements that short cut, then yes, grouping by `id` would suffice. The safe and more portable way is to group by all of that columns though. – sticky bit Dec 24 '20 at 19:08
0

Your code looks quite fine. I would jus recommend aggregating by the primary key of the customer table rather than by the name:

SELECT c.id, c.name AS customer_name, c.country , SUM(d.price) AS total_price
FROM customers c
JOIN customers_order_details d ON c.id = d.customer_id
GROUP BY c.id
HAVING SUM(d.price) >= 100
ORDER BY total_price DESC;

This makes the code a valid aggregation query; all non-aggregated columns in the select clause are functionally dependent on the column in the group by clause.

As a side note: using column aliases in the HAVING clause is a MySQL extension to the SQL standard. You can use that feature, or phrase the HAVING clause in pure ANSI SQL, repeating the aggregate expression.

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you for the input! Is there a caveat if I would use all other columns specified? such as using c.country, c.name with c.id? – user_2.177 Dec 24 '20 at 19:00
  • @user_2.177: `c.id` is the primary key of `c`, so you can put any column that you like from that table in the `select` clause: they are *functionally dependent* on `id`, so there is no need to put them in the `group by` clause (that would be redondant). See: https://dev.mysql.com/doc/refman/8.0/en/group-by-functional-dependence.html – GMB Dec 24 '20 at 19:11