1

I would like some help in turning this table with different uuids for same emails and get only the row with min(timestamp), eliminating the other ones

Sample data:

UUID   email       created_timestamp  
  1    a@g.com        2017-05-01
  2    a@g.com        2018-05-01
  3    a@g.com        2018-05-20
  4    b@g.com        2017-04-01
  5    b@g.com        2017-06-01

Expected output:

UUID   email       created_timestamp 
  1    a@g.com        2017-05-01
  4    b@g.com        2017-04-01

I have tried to use group by, but it makes me group by UUID, what does not make sense in this case

  • See [here](https://stackoverflow.com/questions/tagged/greatest-n-per-group+postgresql) for many, many answers –  May 22 '18 at 13:53
  • You have to be able to correctly, reliably and repeatably order them, ie Your `created_timestamp` seems like it may contain duplicated values. You can order by that, but then if you have two with the same value, which one is first? You may be able to use `UUID`, but the column name indicates that field may be a UUID datatype, and they don't always work in an orderable manner. To make sure you get what you need, make sure to test edge cases. – Shawn May 22 '18 at 14:13
  • Shawn, thank you for commenting. Actually it is not possible to have duplicate values on timestamp, because it is the moment of creation of the account, what, for the same email, cannot be exactly equal. – Flavio de Falcao e Helena May 22 '18 at 14:26

3 Answers3

0
SELECT
  *
FROM
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY email
                           ORDER BY created_timestamp
                      )
                        AS SequenceID
  FROM
    yourTable
)
  sorted
WHERE
  SequenceID = 1
MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • Hey Mat, thank you for answering. Do you have more information about this partition statement? I have been studying window for postgresql, but I am not very secure using it – Flavio de Falcao e Helena May 22 '18 at 14:28
  • @FlaviodeFalcaoeHelena: https://www.postgresql.org/docs/current/static/tutorial-window.html –  May 22 '18 at 14:30
  • It's like `GROUP BY` in that it gathers rows together if they share the same value in specific columns. But it doesn't aggregate them, the individual rows stay as individual rows. In the case of `ROW_NUMBER()` it means create separate sequences (starting from 1) for each different `email`. – MatBailie May 22 '18 at 14:32
0

You can use LIMIT clause with subquery :

select t.* 
from table t
where UUID = (select UUID
              from table t1
              where t1.email = t.email
              order by created_timestamp asc
              limit 1
             );

However, if the created_timestamp has no duplicate then you can express this as :

select t.* 
from table t
where created_timestamp = (select min(created_timestamp)
                           from table t1
                           where t1.email = t.email
                           );
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
0

I think the best way in Postgres is distinct on:

select distinct on (email) t.*
from t
order by email, created_timestamp desc;

In particular, with an index on (email, created_timestamp), this should be the fastest solution.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786