3

I would like to drop some rows that meets certain conditions but I do not want to drop the first row even if the first row meets that criteria.

I tried dropping rows by using the df.drop function but it will erase the first row if the first row meets that condition. I do not want that.

Data looks something like this:

Column1 Column2 Column3
  1        3      A
  2        1      B
  3        3      C
  4        1      D
  5        1      E
  6        3      F

I want to do it in a way that if a row has a value of 3 in column2 then drop it.

And I want the new data to be like this (after dropping but keeping the first one even though the first row had a value of 3 in column 2):

Column1 Column2 Column3
  1        3      A
  2        1      B
  4        1      D
  5        1      E
cs95
  • 379,657
  • 97
  • 704
  • 746
cck1110
  • 55
  • 4

3 Answers3

2

You can make "retain first row" a part of your condition for dropping/keeping rows.

The condition for keeping the rows would be <ORIGINAL CONDITION> or <CONDITION TO KEEP FIRST ROW>. In code, this is

# (condition to drop 3) | (condition to keep 0th row)
df[(df['Column2'] != 3) | (df.index == 0)]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Inversely, using DeMorgan's Laws, the condition for dropping rows would be (df['Column2'] == 3) & (df.index != 0). We then invert the condition to get our expected output,

df[~((df['Column2'] == 3) & (df.index != 0))]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

These work assuming your index is a RangeIndex. If not, use pd.RangeIndex(len(df)) == 0 as the second condition instead.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thank you for your answer. Would it be okay to ask an additional question? Let's say if I had other values in Column2 so basically I only want to keep 1 and remove any other values (not just 3 but like 2, 4, 5, 6, 7 and so on). – cck1110 Mar 28 '19 at 01:20
  • 1
    @cck1110 You can use `isin`. More information [here](https://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe). – cs95 Mar 28 '19 at 01:24
  • 1
    @cck1110 You can also write your condition as `df[(df['Column2'] == 1) | (df.index == 0)]` if 1 is the only thing you want to keep. – cs95 Mar 28 '19 at 01:30
  • I tried the method df[(df['Column2'] == 1) | (df.index == 0)] but if the first row's value for [column2] is "2" for example, then it will remove that first row as well. – cck1110 Mar 28 '19 at 01:33
  • 1
    @cck1110 I tried exactly that and it works for me. Maybe your index is different. Try: `df2 = df[(df['Column2'] == 1) | (pd.RangeIndex(len(df)) == 0)]` – cs95 Mar 28 '19 at 01:35
  • 1
    @cck1110 ha! I knew it. That was in my answer from way before. Be sure to read my answer more carefully next time :p – cs95 Mar 28 '19 at 01:39
2

I am using duplicated

df[(~df.Column2.duplicated())|df.Column2.ne(3)]
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Already great answers posted here, but just for your convenience. We can also use cumcount for this to rank Nth 3 we find:

df = df[~((df.groupby('Column2').Column2.cumcount() != 0) & (df.Column2 == 3))]

print(df)
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E
Erfan
  • 40,971
  • 8
  • 66
  • 78