How to remove subsequent records in pandas dataframe subject to condition

Question

I have a pandas dataframe that I have created as follows:

import pandas as pd

ds1 = {'col1':[1,2,3,4,5,6,7], "col2" : [1,1,0,1,1,1,1]}

df1 = pd.DataFrame(data=ds1)

The dataframe looks like this:

print(df1)
   col1  col2
0     1     1
1     2     1
2     3     0
3     4     1
4     5     1
5     6     1
6     7     1

As soon as col2 is equal to 0, I want to remove all of the subsequent records, regardless of their values. In this case, the resulting dataframe would look like this:

   col1  col2
0     1     1
1     2     1
2     3     0

Another example.

import pandas as pd
import numpy as np

ds1 = {'col1':[1,2,3,4,5,6,7], "col2" : [0,0,0,1,1,1,1]}

df1 = pd.DataFrame(data=ds1)

In this case the resulting dataframe would look like this:

   col1  col2
0     1     0

Does anyone know how to do it in python?

Also, additional question.

import pandas as pd


ds1 = {'col1':[1,1,1,1,1,1,1, 2,2,2,2,2,2,2], "col2" : [1,1,0,1,1,1,1,1,1,0,1,1,1,1]}

df1 = pd.DataFrame(data=ds1)
print(df1)

    col1  col2
0      1     1
1      1     1
2      1     0
3      1     1
4      1     1
5      1     1
6      1     1
7      2     1
8      2     1
9      2     0
10     2     1
11     2     1
12     2     1
13     2     1

I need to remove the records (same condition as above) BUT by col1. So the resulting dataframe would look like this:

    col1  col2
0      1     1
1      1     1
2      1     0
7      2     1
8      2     1
9      2     0

Ciao Giampaolo, it's not quite clear to me how are you using the rule on the last example. There are no zeros in `col1`. — rpanai, Apr 27 '23 at 16:54

score 2 · Answer 1 · answered Apr 27 '23 at 13:58

2

You could do

out = df.loc[:df.col2.idxmin()]
Out[28]: 
   col1  col2
0     1     1
1     2     1
2     3     0

answered Apr 27 '23 at 13:58

BENY

317,841
20
164
234

PaulS · Answer 2 · 2023-04-27T14:30:52.527

1

Another possible solution:

df1.iloc[0:(1+df1['col2'].eq(0).idxmax()), :]

Alternatively,

df1[~df1['col2'].eq(0).cummax().shift(fill_value=False)]

Output:

   col1  col2
0     1     1
1     2     1
2     3     0

edited Apr 27 '23 at 14:30

answered Apr 27 '23 at 14:14

PaulS

21,159
2
9
26

RomanPerekhrest · Accepted Answer · 2023-04-27T16:02:03.167

1

Selecting by integer location:

df = df1.iloc[:df1[df1['col2'].eq(0)].index[0] + 1]

Or alternatively with df.loc:

df = df1.loc[:df1[df1['col2'].eq(0)].index[0]]

   col1  col2
0     1     1
1     2     1
2     3     0

Solution for your 2nd question:

df = (df1.groupby('col1', as_index=False)
      .apply(lambda x: x.loc[:x[x['col2'].eq(0)].index[0]])
      .reset_index(drop=True))

   col1  col2
0     1     1
1     1     1
2     1     0
3     2     1
4     2     1
5     2     0

edited Apr 27 '23 at 16:02

answered Apr 27 '23 at 14:14

RomanPerekhrest

88,541
4
65
105

Hi Roman, thanks. Would it be possible to do it by "col1" ds1 = {'col1':[1,1,1,1,1,1,1, 2,2,2,2,2,2,2], "col2" : [1,1,0,1,1,1,1,1,1,0,1,1,1,1]} df1 = pd.DataFrame(data=ds1) print(df1) so, in this case I would delete the records for col1 when this is equal to 1 and 2 separately? – Giampaolo Levorato Apr 27 '23 at 14:24

How to remove subsequent records in pandas dataframe subject to condition

3 Answers3