How to select only the first rows for each unique value of a column?

Question

Let's say I have a table of customer addresses:

+-----------------------+------------------------+
|         CName         |      AddressLine       |
+-----------------------+------------------------+
|  John Smith           |  123 Nowheresville     |
|  Jane Doe             |  456 Evergreen Terrace |
|  John Smith           |  999 Somewhereelse     |
|  Joe Bloggs           |  1 Second Ave          |
+-----------------------+------------------------+

In the table, one customer like John Smith can have multiple addresses. I need the SELECT query for this table to return only first row found where there are duplicates in 'CName'. For this table it should return all rows except the 3rd (or 1st - any of those two addresses are okay but only one can be returned).

Is there a keyword I can add to the SELECT query to filter based on whether the server has already seen the column value before?

score 177 · Accepted Answer · answered Jan 11 '11 at 20:50

177

A very simple answer if you say you don't care which address is used.

SELECT
    CName, MIN(AddressLine)
FROM
    MyTable
GROUP BY
    CName

If you want the first according to, say, an "inserted" column then it's a different query

SELECT
    M.CName, M.AddressLine,
FROM
    (
    SELECT
        CName, MIN(Inserted) AS First
    FROM
        MyTable
    GROUP BY
        CName
    ) foo
    JOIN
    MyTable M ON foo.CName = M.CName AND foo.First = M.Inserted

answered Jan 11 '11 at 20:50

gbn

422,506
82
585
676

Though it may not be intended to be used this way when selecting 10 columns. Also seems it cannot accept a column of the bit type. – nuit9 Jan 11 '11 at 21:43
2

@nuit9: of course it won't work with bit and 10 columns. Neither of these facts is in your question. You'd use the 2nd technique or Ben Thul's technique. I answered what you asked specifically, with pointers on how to solve more generally. – gbn Jan 12 '11 at 04:53
The first part DO work with multiple columns, athough not with bit-type columns. I tested this in MS SQL server 2016 though. – netfed Sep 08 '18 at 23:59
1

This answer works for many database platforms. – TheLegendaryCopyCoder Jan 19 '22 at 06:38

FatihAkici · Answer 2 · 2023-08-23T17:06:11.787

38

You can use the row_number() over(partition by ...) syntax like so:

select * from
(
select *
, ROW_NUMBER() OVER(PARTITION BY CName ORDER BY AddressLine) AS rownum
from myTable
) as a
where rownum = 1

What this does is that it creates a column called row, which is a counter that increments every time it sees the same CName, and indexes those occurrences by AddressLine. By imposing where row = 1, one can select the CName whose AddressLine comes first alphabetically. If the order by was desc, then it would pick the CName whose AddressLine comes last alphabetically.

edited Aug 23 '23 at 17:06

answered Mar 01 '18 at 11:49

FatihAkici

4,679
2
31
48

3

This has the added benefit of not restricting you to only the first row. In my case I was actually looking for the first 3 occurances as a means of sense checking. The last line would just be `where row < 4` – Morvael Sep 02 '21 at 13:52
'row' is a reserved word in mysql. I recommend a different variable on line 4 and 7 – deweydb Aug 09 '23 at 23:28

score 34 · Answer 3 · edited Jan 22 '16 at 00:49

34

In SQL 2k5+, you can do something like:

;with cte as (
  select CName, AddressLine,
  rank() over (partition by CName order by AddressLine) as [r]
  from MyTable
)
select CName, AddressLine
from cte
where [r] = 1

edited Jan 22 '16 at 00:49

Shiva

20,575
14
82
112

answered Jan 11 '11 at 21:08

Ben Thul

31,080
4
45
68

10

Please explain what does rank, partition and [r] do – Roberto Sep 25 '15 at 18:11

Frank · Answer 4 · 2013-11-19T18:32:48.373

8

You can use row_number() to get the row number of the row. It uses the over command - the partition by clause specifies when to restart the numbering and the order by selects what to order the row number on. Even if you added an order by to the end of your query, it would preserve the ordering in the over command when numbering.

select *
from mytable
where row_number() over(partition by Name order by AddressLine) = 1

edited Nov 19 '13 at 18:32

answered Apr 18 '13 at 00:35

Frank

693
8
12

8

In postgresql, window functions are not allowed in WHERE clause – ekanna May 10 '15 at 02:14
7

This is neither allowed for MS-SQL. – Mixxiphoid Aug 02 '16 at 06:37
3

`ROW_NUMBER()` doesn't work in `Where` clause in Teradata as well – Pirate X Aug 19 '16 at 10:20

netfed · Answer 5 · 2018-09-09T00:26:15.050

4

This will give you one row of each duplicate row. It will also give you the bit-type columns, and it works at least in MS Sql Server.

(select cname, address 
from (
  select cname,address, rn=row_number() over (partition by cname order by cname) 
  from customeraddresses  
) x 
where rn = 1) order by cname

If you want to find all the duplicates instead, just change the rn= 1 to rn > 1. Hope this helps

edited Sep 09 '18 at 00:26

answered Sep 09 '18 at 00:11

netfed

602
8
18

I get SQL compilation error: error line 3 at position 25 invalid identifier 'RN' with this solution – Garglesoap Jan 27 '21 at 01:57

score -2 · Answer 6 · edited Feb 23 '23 at 12:02

-2

select amount 
from (
  select distinct(amount) 
  from orders 
  order by amount desc 
  limit 3
) 
order by amount asc 
limit 1;

edited Feb 23 '23 at 12:02

smonff

3,399
3
36
46

answered Jul 30 '22 at 03:29

Gobi

283
1
5
15

score -3 · Answer 7 · answered Feb 03 '21 at 14:20

-3

to get every unique value from your customer table, use

SELECT DISTINCT CName FROM customertable;

more in-depth of w3schools: https://www.w3schools.com/sql/sql_distinct.asp

answered Feb 03 '21 at 14:20

Martin WasGehtSieDasAn

309
2
12

How to select only the first rows for each unique value of a column?

7 Answers7

Linked

Related