Similar question found here but a little twist that I cant solve, and an error that I keep getting. Is it possible for the code to look at the last item in the group and depending on if a condition exits, it will drop the whole group? The condition is: drop entire group if the last row has a 'Status' between 30 and 39 inclusive.
I used a similar approach as found in the link above using apply() and filter(). When I use the apply method, this code removes just that row that has that condition but not the complete group:
df.groupby('Sort_Key').apply(lambda x: x[~x['Status'].between(30,39,inclusive=True)])
but when I use filter method, as suggested in the link above, I get an error
df.groupby('Sort_Key').filter(lambda x: x[~x['Status'].between(30,39,inclusive=True)])
TypeError: filter function returned a Series, but expected a scalar bool
So 1st, what is wrong with my filter approach? Is this the best approach for this problem?
2nd. Is there a way to base the conditional on the final row in that series.
Data Set:
df = pd.DataFrame({'Sort_Key': ['100000003', '100000009', '100000009', '100000009', '100000034','100000034', '100000034', '100000048'],
'Claim_no': [40000, 50000, 42000, 50000, 42000, 50000, 42000, 5000],
'Discharge': [456435, 50345345, 4345435,345756,34557,6867456,345435,346546],
'Admit': [678678, 67867867, 678678,678679,8989,67867,89697,9678678],
'Status': [12, 12, 52,30,30,18,19,30]})
Original dataframe:
Admit Claim_no Discharge Sort_Key Status
Sort_Key
100000003 0 678678 40000 456435 100000003 12
100000009 1 67867867 50000 50345345 100000009 12
2 678678 42000 4345435 100000009 52
3 678679 50000 345756 100000009 30
100000034 4 8989 42000 34557 100000034 30
5 67867 50000 6867456 100000034 18
6 89697 42000 345435 100000034 19
100000048 7 9678678 5000 346546 100000048 30
Final dataframe:
Admit Claim_no Discharge Sort_Key Status
Sort_Key
100000003 0 678678 40000 456435 100000003 12
100000034 4 8989 42000 34557 100000034 30
5 67867 50000 6867456 100000034 18
6 89697 42000 345435 100000034 19