Group pandas dataframe in unusual way

Question

Problem

I have the following Pandas dataframe:

    data = {
        'ID':  [100, 100, 100, 100, 200, 200, 200, 200, 200, 300, 300, 300, 300, 300],
        'value': [False, False, True, False, False, True, True, True, False, False, False, True, True, False],
    }
    df = pandas.DataFrame (data, columns = ['ID','value'])

I want to get the following groups:

Group 1: for each ID, all False rows until the first True row of that ID
Group 2: for each ID, all False rows after the last True row of that ID
Group 3: all true rows

Can this be done with pandas?

What I've tried

I've tried

group = df.groupby((df['value'].shift() != df['value']).cumsum())

but this returns an incorrect result.

- Group 1: for each ID, all False rows until the first True row of that ID. - Group 2: for each ID, all False rows after the last True row of that ID. - Group 3: all true rows. — Ford1892, Sep 29 '20 at 15:05

score 9 · Accepted Answer · answered Sep 29 '20 at 15:10

Let us try shift + cumsum create the groupby key: BTW I really like the way you display your expected output

s = df.groupby('ID')['value'].apply(lambda x : x.ne(x.shift()).cumsum())
d = {x : y for x ,y in df.groupby(s)}
d[2]
     ID  value
2   100   True
5   200   True
6   200   True
7   200   True
11  300   True
12  300   True
d[1]
     ID  value
0   100  False
1   100  False
4   200  False
9   300  False
10  300  False
d[3]
     ID  value
3   100  False
8   200  False
13  300  False

score 2 · Answer 2 · answered Sep 29 '20 at 15:14

Let's try following your logic:

# 1. all False up to first True
group1 = df.loc[df.groupby('ID')['value'].cumsum() == 0]

# 2. all False after last True
group2 = df.loc[df.iloc[::-1].groupby('ID')['value'].cumsum()==0]

# 3. all True
group3 = df[df['value']]

Output:

    ID      value
0   100     False
1   100     False
4   200     False
9   300     False
10  300     False

    ID      value
3   100     False
8   200     False
13  300     False

    ID      value
2   100     True
5   200     True
6   200     True
7   200     True
11  300     True
12  300     True

score 1 · Answer 3 · answered Sep 29 '20 at 15:10

This works for your example data

df['groups'] = df.groupby('ID').value.apply(lambda x: x.diff().ne(False).cumsum()).astype('int')
for _,df_groups in df.groupby('groups'):
  print(df_groups)
  print('-'*20)

Out:

     ID  value  groups
0   100  False       1
1   100  False       1
4   200  False       1
9   300  False       1
10  300  False       1
--------------------
     ID  value  groups
2   100   True       2
5   200   True       2
6   200   True       2
7   200   True       2
11  300   True       2
12  300   True       2
--------------------
     ID  value  groups
3   100  False       3
8   200  False       3
13  300  False       3
--------------------

Group pandas dataframe in unusual way

Problem

What I've tried

3 Answers3

Linked