Select all columns but ignore records having duplicate values on some of them

Question

Let's say I have a cars table. It has following columns: id, color, engine, price, upholstery, model_name. So far I've been querying for all cars based on criteria passed from the front end app. However, now I'm told that if multiple cars have the same value of engine, model_name and price they shouldn't be returned as they are duplicates. In this case, one car should be returned with a number of duplicates instead. So I probably need to do some sort of a group_by on engine, model_name, price but I still need to get values of the other (not-grouped) columns (id, color, upholstery) for each record, as well as duplicates count.

My API needs to stay compliant with the old one. Currently it returns:

[
  {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback'},
  {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx'},
  {id: 3, color: 'yellow', engine: '2.5', price: 20000, upholstery: 'leather', model_name: 'wrx'},
]

Now it should be:

[
  {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback', count: 1},
  {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx', count: 2},
]

Performance does matter in this case. In reality there's way more columns which are queried and joined from multiple tables.

score 1 · Accepted Answer · answered Dec 08 '19 at 20:08

1

You can do this with distinct on and an analytic function:

select distinct on (engine, model_name, price) c.*,
       count(*) over (partition by engine, model_name, price) as cnt
from cars c
order by engine, model_name, price;

answered Dec 08 '19 at 20:08

Gordon Linoff

1,242,037
58
646
786

score 0 · Answer 2 · answered Dec 08 '19 at 19:52

Try

WITH duplicates AS (
    SELECT
        MIN(id) AS min_id,
        engine,
        price,
        model_name,
        COUNT(*) AS number_of_duplicates
    FROM
        cars
    GROUP BY
        engine,
        model_name,
        price
)
SELECT
    c.id,
    c.color,
    d.engine,
    d.price,
    c.upholstery,
    d.model_name,
    d.number_of_duplicates
FROM
    cars c
    INNER JOIN duplicates d ON c.id = d.min_id

Barbaros Özhan · Answer 3 · 2019-12-08T20:12:38.000

0

You can use window analytic function count(..) over (partition by..) and row_number():

select id, color, engine, price, upholstery, model_name, count
  from
  (
    select c.*,
           count(*) over (partition by model_name) as count,
           row_number() over (partition by model_name order by id) as rn
      from cars c
  ) q
 where rn = 1

Demo

edited Dec 08 '19 at 20:12

answered Dec 08 '19 at 20:02

Barbaros Özhan

59,113
10
31
55

Select all columns but ignore records having duplicate values on some of them

3 Answers3