SQL: select unique rows

Question

This is a "toy" example of a table that has many columns and 100s of thousands of rows.

I want FILTER OUT any rows containing the same AcctNo, CustomerName and CustomerContact, but KEEP the ID for ONE of the duplicates (so i can access the record later).

Example:

ID  AcctNo  CustomerName  CustomerContact
1   1111    Acme Foods    John Smith
2   1111    Acme Foods    John Smith
3   1111    Acme Foods    Judy Lawson
4   2222    YoyoDyne Inc  Thomas Pynchon
5   2222    YoyoDyne Inc  Thomas Pynchon
<= I want to save IDs 2, 3, and 5

Fiddle: https://www.db-fiddle.com/f/bEECHi6XnvKAeXC4Xthrrr/1

Q: What SQL do I need to accomplish this?

You might consider using the `row_number()` function. – Dale K Apr 07 '21 at 22:46 — Dale K, Apr 07 '21 at 22:46
Please share what sql you already tried. – Pradatta Apr 07 '21 at 22:47 — Pradatta, Apr 07 '21 at 22:47
You need the maximum Id for each group... – Stu Apr 07 '21 at 22:51 — Stu, Apr 07 '21 at 22:51

score 3 · Accepted Answer · answered Apr 07 '21 at 22:51

3

select MAX(ID) as KeepID,AcctNo,CustomerName,CustomerContact 
from test
GROUP BY AcctNo,CustomerName,CustomerContact

answered Apr 07 '21 at 22:51

Robert Sheahan

2,100
1
10
12

Pradatta · Answer 2 · 2021-04-08T09:01:14.800

1

So basically what you want is, partition your table by AcctNo, CustomerName and CustomerContact. It's unclear in the question how you want select which ID you need to keep, but for that you need to modify the the following query. But this should give you a starting point.

SELECT * 
FROM   test 
       JOIN (SELECT id, 
                    Row_number() 
                      OVER ( 
                        partition BY acctno, customername, customercontact) rn 
             FROM   test) A 
         ON test.id = A.id 
WHERE  A.rn = 1

This should return something like this:

ID	AcctNo	CustomerName	CustomerContact	id	rn
1	11111	Acme Foods	John Smith	1	1
3	11111	Acme Foods	Judy Lawson	3	1
4	22222	Yoyodyne Inc.	Thomas Pynchon	4	1

What this is doing is basically first calculating row num based on the partition criteria and then picking only one row per partition.

edited Apr 08 '21 at 09:01

answered Apr 07 '21 at 23:11

Pradatta

3,000
1
18
22

2

Please don't use images for data... use formatted/tabular text. – Dale K Apr 07 '21 at 23:20
@Pradatta - 1) In answer to your comment above, I tried many things ... but I didn't want to "clutter" my question with a bunch of failed attempts. 2) Regarding your reply: I *considered* partitions and row_number(), but I would have preferred a simpler solution. Like [Robert Sheahan's](https://stackoverflow.com/a/66995135/3135317). 3) Do you know which approach would be more "efficient" (for large-ish datasets)? – FoggyDay Apr 08 '21 at 00:11
1

@FoggyDay The MAX with Group By is more efficient I think. Here's a nice explanation why: https://stackoverflow.com/questions/11233125/is-there-a-performance-difference-in-using-a-group-by-with-max-as-the-aggregat – Pradatta Apr 08 '21 at 09:06
1

Excellent citation - Thank you. – FoggyDay Apr 08 '21 at 18:59

SQL: select unique rows

2 Answers2