Dedupe records without DELETE

Question

I need to bring back only one of the records from a duplicated row in SQL Server

I have data like this

-------------------------------------------
CustomerID, OrderID, ProductID, Title
-------------------------------------------
1,1001,131,orange
1,1002,131,orange
-------------------------------------------

These rows are shown as 2 items that have been ordered by the same person, really they are just two as the quantity chosen in the basket and 2 records.

My question is how can i retrieve only one of these rows?

Thanks

score 5 · Accepted Answer · edited Feb 16 '12 at 16:44

5

Maybe something like this:

First some test data:

DECLARE @tbl TABLE(CustomerID INT,OrderID INT,ProductID INT,Title VARCHAR(100))

INSERT INTO @tbl
VALUES
    (1,1001,131,'orange'),
    (1,1002,131,'orange')

Then the query

;WITH CTE AS
(
    SELECT
        ROW_NUMBER() OVER(PARTITION BY tbl.CustomerID,tbl.ProductID,tbl.Title 
              ORDER BY tbl.OrderID) AS RowNbr,
        tbl.CustomerID,
        tbl.OrderID,
        tbl.ProductID,
        tbl.Title
    FROM
        @tbl AS tbl
)
SELECT
    *
FROM
    CTE
WHERE
    CTE.RowNbr=1

edited Feb 16 '12 at 16:44

Aaron Bertrand

272,866
37
466
490

answered Feb 16 '12 at 16:25

Arion

31,011
10
70
88

I don't think this achieves the desired results. Since you're partitioning by every single column, `iRank` will *ALWAYS* be `1`. Fix your sample data (change second `OrderID` to `1002`, for example, to match what the OP had in the question) and you'll see what I mean. – Aaron Bertrand Feb 16 '12 at 16:29
This is a very good strategy when the criteria for the first record gets complicated and involves many fields and calculation. – JeffO Feb 16 '12 at 16:32
@Jeff O : Yeah thats also true thats why I used a CTE in this case. Because I did not know that the OP wanted with the result – Arion Feb 16 '12 at 16:36
I prefer the CTE solution as well, it's not always just *one* column we're trying to de-dupe. The `MIN`/`MAX` tricks break down quickly. – Aaron Bertrand Feb 16 '12 at 16:41
@Aaron Bertrand: yeah totally agree. I also think that it is preferably because it is easy to read and change are not hard to implement because there is no group by. – Arion Feb 16 '12 at 17:08
@Sandeep Bansal: Glad to help – Arion Feb 16 '12 at 17:09

score 4 · Answer 2 · edited Feb 16 '12 at 16:37

4

This way you can get, not only one of both rows, but also the quantity ordered

SELECT 
   CustomerID, ProductID, Title, max(OrderID) as orderID, COUNT(*) as quantity
FROM 
  TableName 
GROUP BY 
   CustomerID, 
   ProductID, 
   Title

edited Feb 16 '12 at 16:37

Arion

31,011
10
70
88

answered Feb 16 '12 at 16:28

Ricardo Garvão

141
2

score 2 · Answer 3 · answered Feb 16 '12 at 16:24

2

Using Max will get you the most recent order

SELECT CustomerID, MAX(OrderId), ProductID, Title
FROM table
GROUP BY CustomerID, ProductID, Title

OR

Using Min will get you the first order

SELECT CustomerID, MIN(OrderId), ProductID, Title
FROM table
GROUP BY CustomerID, ProductID, Title

answered Feb 16 '12 at 16:24

Taryn

242,637
56
362
405

That won't do. The orderIDs are different. – PHeiberg Feb 16 '12 at 16:25

PHeiberg · Answer 4 · 2012-02-16T16:27:13.290

Provided that it's really what you want you can get the first order of each order with the same customer, product and title using a grouping and the MIN function (MAX would give you the last order):

SELECT CustomerID, MIN(OrderID) AS OrderID, ProductID, Title 
FROM MyTable
GROUP BY CustomerID, ProductID, Title

If you want the number of duplicate orders (that would be the ordered quantity judging by your question) you can add a count:

SELECT CustomerID, MIN(OrderID) AS OrderID, ProductID, Title, 
    COUNT(*) AS Quantity 
FROM MyTable
GROUP BY CustomerID, ProductID, Title

Dedupe records without DELETE

4 Answers4