15

Problem

I have the following Pandas dataframe:

    data = {
        'ID':  [100, 100, 100, 100, 200, 200, 200, 200, 200, 300, 300, 300, 300, 300],
        'value': [False, False, True, False, False, True, True, True, False, False, False, True, True, False],
    }
    df = pandas.DataFrame (data, columns = ['ID','value'])

I want to get the following groups:

  • Group 1: for each ID, all False rows until the first True row of that ID
  • Group 2: for each ID, all False rows after the last True row of that ID
  • Group 3: all true rows

enter image description here

Can this be done with pandas?

What I've tried

I've tried

group = df.groupby((df['value'].shift() != df['value']).cumsum())

but this returns an incorrect result.

Ford1892
  • 741
  • 2
  • 9
  • 20
  • 1
    - Group 1: for each ID, all False rows until the first True row of that ID. - Group 2: for each ID, all False rows after the last True row of that ID. - Group 3: all true rows. – Ford1892 Sep 29 '20 at 15:05
  • Do you ever have `False` between the `True`'s? – Quang Hoang Sep 29 '20 at 15:17

3 Answers3

9

Let us try shift + cumsum create the groupby key: BTW I really like the way you display your expected output

s = df.groupby('ID')['value'].apply(lambda x : x.ne(x.shift()).cumsum())
d = {x : y for x ,y in df.groupby(s)}
d[2]
     ID  value
2   100   True
5   200   True
6   200   True
7   200   True
11  300   True
12  300   True
d[1]
     ID  value
0   100  False
1   100  False
4   200  False
9   300  False
10  300  False
d[3]
     ID  value
3   100  False
8   200  False
13  300  False
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Let's try following your logic:

# 1. all False up to first True
group1 = df.loc[df.groupby('ID')['value'].cumsum() == 0]

# 2. all False after last True
group2 = df.loc[df.iloc[::-1].groupby('ID')['value'].cumsum()==0]

# 3. all True
group3 = df[df['value']]

Output:

    ID      value
0   100     False
1   100     False
4   200     False
9   300     False
10  300     False

    ID      value
3   100     False
8   200     False
13  300     False

    ID      value
2   100     True
5   200     True
6   200     True
7   200     True
11  300     True
12  300     True
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

This works for your example data

df['groups'] = df.groupby('ID').value.apply(lambda x: x.diff().ne(False).cumsum()).astype('int')
for _,df_groups in df.groupby('groups'):
  print(df_groups)
  print('-'*20)

Out:

     ID  value  groups
0   100  False       1
1   100  False       1
4   200  False       1
9   300  False       1
10  300  False       1
--------------------
     ID  value  groups
2   100   True       2
5   200   True       2
6   200   True       2
7   200   True       2
11  300   True       2
12  300   True       2
--------------------
     ID  value  groups
3   100  False       3
8   200  False       3
13  300  False       3
--------------------
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32