0

I want to return all rows that were public in May (2019-05), so if a row was turned to draft (and not back to public) at any point before the end of May, I don't want it. For example:

id | post_id | status | date
-------------------------
 1 | 1       | draft  | 2019-03-25
 2 | 1       | public | 2019-04-02
 3 | 1       | draft  | 2019-05-25
 4 | 2       | draft  | 2019-03-10
 5 | 2       | public | 2019-04-01
 6 | 2       | draft  | 2019-06-01

The desired result for the above would return post_id 2 because its last status change prior to the end of May was to public.

post_id 1 was put back in draft before the end of May, so it would not be included.

I'm not sure how to use the correct join or sub-queries to do this as efficiently as possible.

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
HWD
  • 1,547
  • 7
  • 36
  • 72

2 Answers2

1

You seem to want the status as of 2019-05-31. A correlated subquery seems like the simplest solution:

select t.*
from t
where t.date = (select max(t2.date)
                from t t2
                where t2.post_id = t.post_id and
                      t2.date <= '2019-05-31'
               );

To get the ones that are public, just add a WHERE condition:

select t.*
from t
where t.date = (select max(t2.date)
                from t t2
                where t2.post_id = t.post_id and
                      t2.date <= '2019-05-31'
               ) and
      t.status = 'public';

For performance, you want an index on (post_id, date).

You can also phrase this using a JOIN:

select t.*
from t join
     (select t2.post_id, max(t2.date) as max_date
      from t t2
      where t2.date <= '2019-05-31'
      group by t2.post_id
     ) t2
     on t2.max_date = t.date
where t.status = 'public';

I would expect the correlated subquery to have better performance with the right indexes. However, sometimes MySQL surprises me.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

we need to determine whether

  1. the status of each post_id is public prior to the month May (the subquery with max(date)),
  2. any post_id exists with status not equals public within the month May,
  3. and then exclude the post_id satisfying the matter 2.

So, you can use :

select distinct t1.post_id
  from tab t1
where t1.post_id not in
    (
     select distinct t1.post_id
       from tab t1
       join
       (
        select post_id, max(date) as date
          from tab 
         where '2019-05-01'> date
         group by post_id ) t2
         on t1.post_id = t2.post_id 
      where t1.status != 'public' 
        and t1.date < '2019-06-01' 
        and t1.date > '2019-04-30'
);

+---------+
| POST_ID |
+---------+
|    2    |
+---------+

Demo

Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55