0

I have a table with the following schema:

id itemid date        some additional data
1   1000  10/12/2020       a
2   1000  10/12/2020       b
3   1002  09/12/2020       c
4   1001  07/12/2020       d
5   1000  05/12/2020       e
6   1005  03/12/2020       f
7   1003  03/12/2020       g

In this table only the id field is unique. I'm concerned with getting the rows containing the last X distinct itemid, ordered by date.

For example, in the sample above, if I'd like to get the last 3 distinct itemid, I'd be getting the first 4 rows, since in the first 4 rows we have three distinct itemid: 1000, 1002 and 1001. I'm not sure how to achieve this using a single SQL statement.

Alk
  • 5,215
  • 8
  • 47
  • 116
  • I don't understand this. If you want to get distinct itemids then why do you expect the first 4 rows as that would repeat itemid = 1000 –  Jan 25 '21 at 12:27
  • There's additional fields in the table which differ rows 1 and 2 - I'm concerned with getting all the data related to the last X distinct itemids - let me update the schema. – Alk Jan 25 '21 at 12:28
  • Just a note, tables have _columns_, not fields. – jarlh Jan 25 '21 at 12:34
  • yep, my bad.... – Alk Jan 25 '21 at 12:34

2 Answers2

1

If I understand correctly, you would like to count the number of distinct item ids up to each each row (by date) and return all rows where the count is three.

If Postgres supported this, you could use:

select t.*
from (select t.*, 
             count(*) filter (where id = min_id) over (order by date desc) as cnt_itemid
      from (select t.*,
                   min(id) over (partition by itemid order by date desc) as min_id
            from t
           ) t
     ) t
where cnt_itemid <= 3;

Alas, Postgres does not support COUNT(DISTINCT) as a window function. But you can calculate it using DENSE_RANK():

select t.*
from (select t.*, 
             count(*) over (filter where id = min_id) as cnt_itemid
      from (select t.*,
                   min(id) over (partition by itemid order by date) as min_id
            from t
           ) t
     ) t
where cnt_itemid <= 3;

However, this returns all the most recent rows up before the 4th item -- so it has extra rows.

To get four rows, you want the first where the item id is "3". One method is:

select t.*
from (select t.*, min(id) filter (where cnt_itemid = 3) over () as min_cnt_itemid_3
      from (select t.*, 
                   count(*) filter (where id = min_id) over (order by date desc) as cnt_itemid
            from (select t.*,
                         min(id) over (partition by itemid order by date desc) as min_id
                  from t
                 ) t
           ) t
     ) t
where id <= min_cnt_itemid_3;

You can also do this by identifying the first occurrence of the "third item" and then choosing all rows up to that row:

select t.*
from t join
     (select itemid, min(max_date) over () as min_max_date
      from (select t.itemid, max(date) as max_date
            from t
            group by t.itemid
            order by max(t.date) desc
            limit 3
           ) t
      ) tt
      on t.itemid = tt.itemid and t.date >= tt.min_max_date;

This fiddle shows each of these.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Three is just a number I used to illustrate the example - what I'm trying to achieve is get all the data concerning the last X distinct itemids - let me illustrate with examples - suppose the itemids are articles and the rows in this table represent comments on articles, I would like to obtain all the comments (rows) made on the most recent X articles. For example - all the comments made on the last 20 articles. – Alk Jan 25 '21 at 12:32
  • So the query would be "order the table by date descending, keep selecting rows until you've counted X distinct itemids" – Alk Jan 25 '21 at 12:34
0

You can use the analytical function as follows:

select * from
(select t.*,
       conut(distinct item_id) over (order by date desc) as cnt
  from your_Table t) t
 where cnt <= 3
Popeye
  • 35,427
  • 4
  • 10
  • 31