Pandas filter rows based on condition, but always retain the first row

Question

I would like to drop some rows that meets certain conditions but I do not want to drop the first row even if the first row meets that criteria.

I tried dropping rows by using the df.drop function but it will erase the first row if the first row meets that condition. I do not want that.

Data looks something like this:

Column1 Column2 Column3
  1        3      A
  2        1      B
  3        3      C
  4        1      D
  5        1      E
  6        3      F

I want to do it in a way that if a row has a value of 3 in column2 then drop it.

And I want the new data to be like this (after dropping but keeping the first one even though the first row had a value of 3 in column 2):

Column1 Column2 Column3
  1        3      A
  2        1      B
  4        1      D
  5        1      E

cs95 · Accepted Answer · 2019-03-28T01:34:20.200

2

You can make "retain first row" a part of your condition for dropping/keeping rows.

The condition for keeping the rows would be <ORIGINAL CONDITION> or <CONDITION TO KEEP FIRST ROW>. In code, this is

# (condition to drop 3) | (condition to keep 0th row)
df[(df['Column2'] != 3) | (df.index == 0)]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Inversely, using DeMorgan's Laws, the condition for dropping rows would be (df['Column2'] == 3) & (df.index != 0). We then invert the condition to get our expected output,

df[~((df['Column2'] == 3) & (df.index != 0))]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

These work assuming your index is a RangeIndex. If not, use pd.RangeIndex(len(df)) == 0 as the second condition instead.

edited Mar 28 '19 at 01:34

answered Mar 28 '19 at 01:09

cs95

379,657
97
704
746

Thank you for your answer. Would it be okay to ask an additional question? Let's say if I had other values in Column2 so basically I only want to keep 1 and remove any other values (not just 3 but like 2, 4, 5, 6, 7 and so on). – cck1110 Mar 28 '19 at 01:20
1

@cck1110 You can use `isin`. More information [here](https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe). – cs95 Mar 28 '19 at 01:24
1

@cck1110 You can also write your condition as `df[(df['Column2'] == 1) | (df.index == 0)]` if 1 is the only thing you want to keep. – cs95 Mar 28 '19 at 01:30
I tried the method df[(df['Column2'] == 1) | (df.index == 0)] but if the first row's value for [column2] is "2" for example, then it will remove that first row as well. – cck1110 Mar 28 '19 at 01:33
1

@cck1110 I tried exactly that and it works for me. Maybe your index is different. Try: `df2 = df[(df['Column2'] == 1) | (pd.RangeIndex(len(df)) == 0)]` – cs95 Mar 28 '19 at 01:35
1

@cck1110 ha! I knew it. That was in my answer from way before. Be sure to read my answer more carefully next time :p – cs95 Mar 28 '19 at 01:39

score 2 · Answer 2 · answered Mar 28 '19 at 01:10

2

I am using duplicated

df[(~df.Column2.duplicated())|df.Column2.ne(3)]
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

answered Mar 28 '19 at 01:10

BENY

317,841
20
164
234

score 2 · Answer 3 · answered Mar 28 '19 at 01:21

Already great answers posted here, but just for your convenience. We can also use cumcount for this to rank Nth 3 we find:

df = df[~((df.groupby('Column2').Column2.cumcount() != 0) & (df.Column2 == 3))]

print(df)
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Pandas filter rows based on condition, but always retain the first row

3 Answers3