SUM(DISTINCT) Based on Other Columns

Question

I currently have a table that looks something like this:

+------+-------+------------+------------+
| id   | rate  | first_name | last_name  |
+------+-------+------------+------------+

What I need to do is get the SUM of the rate column, but only once for each name. For example, I have three rows of name John Doe, each with rate 8. I need the SUM of those rows to be 8, not 24, so it counts the rate once for each group of names.

SUM(DISTINCT last_name, first_name) would not work, of course, because I'm trying to sum the rate column, not the names. I know when counting individual records, I can use COUNT(DISTINCT last_name, first_name), and that is the type of behavior I am trying to get from SUM.

How can I get just SUM one rate for each name?

Thanks in advance!

score 15 · Accepted Answer · answered Aug 02 '12 at 17:03

15

select sum (rate)
from yourTable
group by first_name, last_name

Edit

If you want to get all sum of those little "sums", you will get a sum of all table..

Select sum(rate) from YourTable

but, if for some reason are differents (if you use a where, for example) and you need a sum for that select above, just do.

select sum(SumGrouped) from 
(    select sum (rate) as 'SumGrouped'
    from yourTable
    group by first_name, last_name) T1

answered Aug 02 '12 at 17:03

Gonzalo.-

12,512
5
50
82

Now, I apologize for not mentioning this in the original answer: how could I, then, get a sum of all those new `SUM`s? I know I could do it in PHP, but is is possible in that query to get a `SUM` of the result? – David Aug 02 '12 at 17:05
That second query results in `SUM(rate)` without any `GROUP BY` influence, and I have no idea why. – David Aug 02 '12 at 17:13
if you sum fields from a table, and show it grouped (but you sum ALL fields), you will get the same result than sum directly all fields. That's why I put that select, but considering that. If not, use the third query – Gonzalo.- Aug 02 '12 at 17:17
Wait- maybe I didn't understand you. If you want the sum of all that sums, and also all that new sums, you will have to do two querys. For get the new sum's, use the first query. For the sum of sums, use second or third, according what you need – Gonzalo.- Aug 02 '12 at 17:20
I apologize for being unclear. Say I have six rows, two sets of three names. One set of names (three rows) has a 'rate' of 7 and the other set of names (three rows) has a rate of 8. I'm trying to get 15 out of those 6 rows. A name only gets added to the `SUM` once, no matter how many rows there are for that name. – David Aug 02 '12 at 17:23
2

I found my solution: `SELECT SUM(rate) FROM (SELECT * FROM records GROUP BY last_name, first_name) T1` Thanks for your help, it really did get me set in the right direction. – David Aug 02 '12 at 17:25
1

I don't think this answer works. As mentioned above, it just returns the total sum. See @Georgy Vladimirov's answer for something that really works. – juacala Feb 27 '17 at 16:15

score 8 · Answer 2 · edited Mar 22 '17 at 14:10

David said he found his answer as such:

SELECT SUM(rate) FROM (SELECT * FROM records GROUP BY last_name, first_name) T1

But when you do the GROUP BY in the inner query, I think you have to use aggregate functions in your SELECT. So, I think the answer is more like:

SELECT SUM(rate) FROM (SELECT MAX(rate) AS rate FROM records GROUP BY last_name, first_name) T1

I picked MAX() to pick only one "rate" for a "last_name, first_name" combination but MIN() should work the same, assuming that the "last_name, first_name" always leads us to the same "rate" even when it happens multiple times in the table. This seems to be David's original assumption - that for a unique name we want to grab the rate only once because we know it will be the same.

score 7 · Answer 3 · answered Mar 13 '19 at 16:47

You can do this by making the values you are summing distinct. This is possible but is very very ugly.

First, you can turn a string into a number by taking a hash. The SQL below does an MD5 hash of the first and last name, which returns 32 hexadecimal digits. SUBSTRING takes the first 8 of these, and CONV turns that into a 10 digit number (it's theoretically possible this won't be unique):

CONV(SUBSTRING(MD5(CONCAT(first_name,last_name)), 1, 8), 16, 10)

Then you divide that by a very big number and add it to the rate. You'll end up with a rate like 8.0000019351087950. You have to use FORMAT to avoid MySQL truncating the decimal places. This rate will now be unique for each first name and last name.

FORMAT(rate + CONV(SUBSTRING(MD5(CONCAT(first_name,last_name)), 1, 8), 16, 10)/1000000000000000, 16)

And then if you do the SUM DISTINCT over that it will only count the 8 once. Then you need to FLOOR the result to get rid of the extra decimal places:

FLOOR(SUM(DISTINCT FORMAT(rate + CONV(SUBSTRING(MD5(CONCAT(first_name,last_name)), 1, 8), 16, 10)/1000000000000000, 16)))

I found this approach while doing a much more complicated query which joined and grouped several tables. I'm still not sure if I'll use it as it is pretty horrible, but it does work. It's also 6 years too late to be of any use to the person who answered the question.

You posted an answer to a question that is almost 7 years old and already has an answer marked as accepted? — David, Mar 13 '19 at 20:18
Yes I did. People are still searching for answers to this question (like I did yesterday) and perhaps they'd appreciate seeing a different approach. — Paul, Mar 14 '19 at 11:52

score 2 · Answer 4 · answered Aug 02 '12 at 17:12

2

SELECT SUM(rate)
FROM [TABLE] 
GROUP BY first_name, last_name;

answered Aug 02 '12 at 17:12

Vasil Nikolov

1,112
10
17

score 2 · Answer 5 · answered Jan 21 '22 at 17:26

Recently, I came across a similar problem, but with the exception that I already had a GROUP BY clause for a different purpose. Here is an example:

SELECT r.name, SUM(r.rate), MIN(e.created_at)
FROM Rates r LEFT JOIN Events e ON r.id = e.rate_id
GROUP BY r.id

The problem here is that because of JOIN with Event SUM(r.rate) would sum duplicates for entries with multiple Events. In my case the query was a lot more complicated, so I wanted to avoid having extra subqueries. Luckily, there is an elegant solution:

SELECT r.name, SUM(r.rate) / GREATEST(COUNT(DISTINCT e.event_id), 1), MIN(e.created_at)
FROM Rates r LEFT JOIN Events e ON r.id = e.rate_id
GROUP BY r.id

GREATEST function is used to prevent division by zero for entries without any Events. If you are summing integers, you also might want to CAST the sums to INT

score 1 · Answer 6 · answered Aug 02 '12 at 17:01

1

SELECT SUM(rate)
FROM [TABLE] 
GROUP BY CONCAT_WS(' ', first_name, last_name);

answered Aug 02 '12 at 17:01

MetalFrog

9,943
1
22
24

Out of curiosity, why did you use `CONCAT_WS`? – David Aug 02 '12 at 17:05
2

To group the first and last name together with a separator. If you want it per each last name and each first name, you'd do `GROUP BY last_name, first_name`. That would provide sums for each last name, and each first name, not a full name. – MetalFrog Aug 02 '12 at 17:07
Thank you very much, that's extremely helpful on other parts of this application :) – David Aug 02 '12 at 17:08
1

@David there's quite a few [string functions](http://dev.mysql.com/doc/refman/5.0/en/string-functions.html). – MetalFrog Aug 02 '12 at 17:11
2

If you GROUP BY last_name, first_name, you will get a sum for each combination of First_Name, and Last_Name. You will get the same result – Gonzalo.- Aug 02 '12 at 17:11

score 0 · Answer 7 · answered Aug 02 '12 at 17:46

0

You can use any of the above code sample provided since with group by clause without any aggregate function will return an indeterminate one record for each grouping condition. You can refer http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html link for further reading.

answered Aug 02 '12 at 17:46

Manisha Mahawar

627
6
9

1

Link-only answers are bad for this reason in particular. Your link is now broken. – Buggabill May 25 '16 at 18:05

score 0 · Answer 8 · answered Apr 30 '21 at 07:12

0

I found this thread looking for a better way to my solution, but i still didn't find a better one:

SELECT SUM(rate) FROM (SELECT DISTINCT rate, first_name, last_name) Q

answered Apr 30 '21 at 07:12

F. Terenzi

31
1
3

1

Your subquery is missing the `FROM` part. – El_Vanja Apr 30 '21 at 22:46

SUM(DISTINCT) Based on Other Columns

8 Answers8

Linked