4

Problem

Consider the following dataframe:

import pandas as pd
from io import StringIO 
data = StringIO(""",ID,category,app,duplicated
    0,100,OperatingSystem,Finder,FALSE
    1,100,Browser,Chrome,FALSE
    2,100,Browser,Chrome,TRUE
    3,100,Browser,Chrome,FALSE
    4,100,Clipboard,Clipboard,FALSE
    5,200,Browser,Chrome,FALSE
    6,200,Browser,Chrome,TRUE
    7,200,Browser,Chrome,FALSE
    8,200,Clipboard,Clipboard,FALSE
    9,300,OperatingSystem,Finder,FALSE
    10,300,Browser,Chrome,TRUE
    11,300,OperatingSystem,Finder,FALSE
    12,300,Clipboard,Clipboard,FALSE
""" )
df_stack = pd.read_csv(data, index_col=0).fillna('')

Using the code from this answer

s = df_stack.groupby('ID')['duplicated'].apply(lambda d : d.ne(d.shift()).cumsum())
d_groups = df_stack.groupby(s)
for g, dataframes in d_groups:
    print(f"\n[Group {g}] - {dataframes['duplicated'].unique()[0]} \n{'-'*20}")
    for index,row in dataframes.iterrows():
        print(index, row['duplicated'], row['category'], row['app'])

I get 3 different groups, depicted in the image below, following these conditions:

  • Group A: for each ID, all False rows until the first True row of that ID
  • Group B: all true rows
  • Group C: for each ID, all False rows after the last True row of that ID

What I want

I want to group A,B,C by category in order to obtain 6 groups in total (right side of image).

enter image description here

EDIT: I solved this with a double for loop, looping over d_groups and grouping the inner groups. Is there a more efficient solution?

Ford1892
  • 741
  • 2
  • 9
  • 20

1 Answers1

3

I think you just need need to groupby the dataframe df_stack on s along with category to get the desired result, so instead of grouping only by s:

d_groups = df_stack.groupby(s)

You should groupby on s along with category column:

d_groups = df_stack.groupby([s, 'category'])

Result:

[Group (1, 'Browser')] - False 
--------------------
1 100 False Browser Chrome
5 200 False Browser Chrome

[Group (1, 'OperatingSystem')] - False 
--------------------
0 100 False OperatingSystem Finder
9 300 False OperatingSystem Finder

[Group (2, 'Browser')] - True 
--------------------
2 100 True Browser Chrome
6 200 True Browser Chrome
10 300 True Browser Chrome

[Group (3, 'Browser')] - False 
--------------------
3 100 False Browser Chrome
7 200 False Browser Chrome

[Group (3, 'Clipboard')] - False 
--------------------
4 100 False Clipboard Clipboard
8 200 False Clipboard Clipboard
12 300 False Clipboard Clipboard

[Group (3, 'OperatingSystem')] - False 
--------------------
11 300 False OperatingSystem Finder
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53