Problem
Consider the following dataframe:
import pandas as pd
from io import StringIO
data = StringIO(""",ID,category,app,duplicated
0,100,OperatingSystem,Finder,FALSE
1,100,Browser,Chrome,FALSE
2,100,Browser,Chrome,TRUE
3,100,Browser,Chrome,FALSE
4,100,Clipboard,Clipboard,FALSE
5,200,Browser,Chrome,FALSE
6,200,Browser,Chrome,TRUE
7,200,Browser,Chrome,FALSE
8,200,Clipboard,Clipboard,FALSE
9,300,OperatingSystem,Finder,FALSE
10,300,Browser,Chrome,TRUE
11,300,OperatingSystem,Finder,FALSE
12,300,Clipboard,Clipboard,FALSE
""" )
df_stack = pd.read_csv(data, index_col=0).fillna('')
Using the code from this answer
s = df_stack.groupby('ID')['duplicated'].apply(lambda d : d.ne(d.shift()).cumsum())
d_groups = df_stack.groupby(s)
for g, dataframes in d_groups:
print(f"\n[Group {g}] - {dataframes['duplicated'].unique()[0]} \n{'-'*20}")
for index,row in dataframes.iterrows():
print(index, row['duplicated'], row['category'], row['app'])
I get 3 different groups, depicted in the image below, following these conditions:
- Group A: for each ID, all False rows until the first True row of that ID
- Group B: all true rows
- Group C: for each ID, all False rows after the last True row of that ID
What I want
I want to group A,B,C by category in order to obtain 6 groups in total (right side of image).
EDIT: I solved this with a double for loop, looping over d_groups
and grouping the inner groups. Is there a more efficient solution?