i have data with 3 columns: date, id, sales. my first task is filtering sales above 100. i did it. second task, grouping id by consecutive days.
index | date | id | sales |
---|---|---|---|
0 | 01/01/2018 | 03 | 101 |
1 | 01/01/2018 | 07 | 178 |
2 | 02/01/2018 | 03 | 120 |
3 | 03/01/2018 | 03 | 150 |
4 | 05/01/2018 | 07 | 205 |
the result should be:
index | id | count |
---|---|---|
0 | 03 | 3 |
1 | 07 | 1 |
2 | 07 | 1 |
i need to do this task without using pandas/dataframe, but right now i can't imagine from which side attack this problem. just for effort, i tried the suggestion for a solution here count consecutive days python dataframe but the ids' not grouped. here is my code:
data = df[df['sales'] >= 100]
data['date'] = pd.to_datetime(data['date']).dt.date
s = data.groupby('id').date.diff().dt.days.ne(1).cumsum()
new_frame = data.groupby(['id', s]).size().reset_index(level=0, drop=True)
it is very importent that the "new_frame" will have "count" column, because after i need to count id by range of those count days in "count" column. e.g. count of id's in range of 0-7 days, 7-12 days etc. but it's not part of my question. Thank you a lot