1

Let's say I have a cars table. It has following columns: id, color, engine, price, upholstery, model_name. So far I've been querying for all cars based on criteria passed from the front end app. However, now I'm told that if multiple cars have the same value of engine, model_name and price they shouldn't be returned as they are duplicates. In this case, one car should be returned with a number of duplicates instead. So I probably need to do some sort of a group_by on engine, model_name, price but I still need to get values of the other (not-grouped) columns (id, color, upholstery) for each record, as well as duplicates count.

My API needs to stay compliant with the old one. Currently it returns:

[
  {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback'},
  {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx'},
  {id: 3, color: 'yellow', engine: '2.5', price: 20000, upholstery: 'leather', model_name: 'wrx'},
]

Now it should be:

[
  {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback', count: 1},
  {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx', count: 2},
]

Performance does matter in this case. In reality there's way more columns which are queried and joined from multiple tables.

Jacka
  • 2,270
  • 4
  • 27
  • 34

3 Answers3

1

You can do this with distinct on and an analytic function:

select distinct on (engine, model_name, price) c.*,
       count(*) over (partition by engine, model_name, price) as cnt
from cars c
order by engine, model_name, price;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Try

WITH duplicates AS (
    SELECT
        MIN(id) AS min_id,
        engine,
        price,
        model_name,
        COUNT(*) AS number_of_duplicates
    FROM
        cars
    GROUP BY
        engine,
        model_name,
        price
)
SELECT
    c.id,
    c.color,
    d.engine,
    d.price,
    c.upholstery,
    d.model_name,
    d.number_of_duplicates
FROM
    cars c
    INNER JOIN duplicates d ON c.id = d.min_id
Steffen Bobek
  • 582
  • 1
  • 5
  • 18
0

You can use window analytic function count(..) over (partition by..) and row_number():

select id, color, engine, price, upholstery, model_name, count
  from
  (
    select c.*,
           count(*) over (partition by model_name) as count,
           row_number() over (partition by model_name order by id) as rn
      from cars c
  ) q
 where rn = 1

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55