-1

I have a table similar to this (without the attempt column):

user_id status date attempt
1 completed 5/4/2022 1
2 completed 5/5/2022 1
1 failed 5/6/2022 1
1 failed 5/7/2022 2
2 completed 5/8/2022 1
1 completed 5/9/2022 3
2 failed 5/10/2022 1
2 completed 5/11/2022 2
2 completed 5/12/2022 1
2 completed 5/13/2022 1

I'm trying to figure out how to create the "attempt" column. Basically the attempt number should reset to 0 on a user's next attempt a "completed" status, specific to user_id. The example is small but the actual dataset I'm working with is large so I can't do anything super computationally intensive. I have access to the data in SQL or Python, whichever is easier

I was able to get a correct "attempt" column for a single user with the answer from this question: Count rows with positive values and reset if negative but it didn't work when the dataframe contained multiple users.

I haven't found anything that works in SQL either. This was my most recent effort: textquestion:Attempt number over transitory statuses in T-SQL

GMB
  • 216,147
  • 25
  • 84
  • 135
nleffell
  • 1
  • 1

1 Answers1

1

We can address this as a gaps-and-islands problem. One way to solve it is to define groups with a window count of "previously" completed statuses, then enumerate records within each group:

select user_id, status, date,
    row_number() over(partition by user_id, grp order by date) attempt
from (
    select t.*,
        sum(case when status = 'completed' then 1 else 0 end) over(
            partition by user_id order by date 
            rows between unbounded preceding and 1 preceding
        ) grp
    from mytable t
) t
order by date
user_id status date attempt
1 completed 2022-05-04 1
2 completed 2022-05-05 1
1 failed 2022-05-06 1
1 failed 2022-05-07 2
2 completed 2022-05-08 1
1 completed 2022-05-09 3
2 failed 2022-05-10 1
2 completed 2022-05-11 2
2 completed 2022-05-12 1
2 completed 2022-05-13 1

fiddle

GMB
  • 216,147
  • 25
  • 84
  • 135