7

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.

I'd like to do a select * from the table, but only return one row with the same id – the most recently requested.

I've looked into group by id but then I need to do an aggregate for each column. This is easy with requested_at – max(requested_at) as requested_at – but the others are tough.

How do I make sure I get the value for title, etc that corresponds to that most recently updated row?

Kevin Moore
  • 5,921
  • 2
  • 29
  • 43

2 Answers2

11

I suggest a similar form that avoids a sort in the window function:

SELECT *
    FROM (
      SELECT
          *,
          MAX(<timestamp_column>)
              OVER (PARTITION BY <id_column>)
              AS max_timestamp,
      FROM <table>
    )
    WHERE <timestamp_column> = max_timestamp
Matthew Wesley
  • 616
  • 3
  • 5
3

Try something like this:

    SELECT *
    FROM (
      SELECT
          *,
          ROW_NUMBER()
              OVER (
                  PARTITION BY <id_column>
                  ORDER BY <timestamp column> DESC)
              row_number,
      FROM <table>
    )
    WHERE row_number = 1

Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.

In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.

And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).

Jordan Tigani
  • 26,089
  • 4
  • 60
  • 63