I have a process stages I need to find when did they happen. Unfortunately I can only do it though text extraction. I have extracted dates that match the stages found in the associated text column. Unfortunately some text values are similar and were inputed to the table incorrectly, and so extracted dates are duplicated. I cannot edit the existing table, however, I am trying to create a new one based on the manipulations I am doing (text extract, date ordering, etc). I need to make sure that the dates are not mixed up and are consecutive. For example stage 1 date is earlier than stage 2 etc...
Data sample
Person_ID | Stage | Stage_Num | Stage_Date |
---|---|---|---|
1 | Stage-1 | 1 | 2020-01-01 |
1 | Stage-1 | 1 | 2021-02-02 |
1 | Stage-2 | 2 | 2020-05-01 |
1 | Stage-2 | 2 | 2020-05-06 |
1 | Stage-1-R | 3 | 2021-02-02 |
1 | Stage-1-R | 3 | 2020-01-01 |
Final result I am expecting is
Person_ID | Stage | Stage Num | Stage_Date |
---|---|---|---|
1 | Stage-1 | 1 | 2020-01-01 |
1 | Stage-2 | 2 | 2020-05-01 |
1 | Stage-1-R | 3 | 2021-02-02 |
I've tried the following:
select a.person_id,a.stage, a.stage_num, min(a.stage_date)
from
all_activity_dates a join all_activity_dates b
where
a.stage_num<=b.stage_num
and a.stage_date<b.stage_date
group by
a.person_id, a.stage, a.stage_num
order by stage_num
However, the result date for stage 1 and 3 turn out to be the same. How do I perform it correctly?
UPD1 To whoever closed my question: I cannot simply select the min and group by STAGE (I definitely know how to do that) In my original question I state I need consecutive order, but with simple MIN DATE STAGE 1 and STAGE 3 would have the SAME DATE , BUT STAGE 3 HAS to be AFTER STAGE 2.
UPD2 Plus for stages I am missing to extract date I'd need to add a random date between previous and next stage
If it is a duplicate, I. am glad to check a working solution, I just could not find a similarly worded question