-2

I have a df where i hot encoded 28 clases with text, for a multi label process. My question is how I can drop a row if any value in the row equals zero, except for one column, that i want to chose.

I'm using jupyter with python 3, and i cant get it.

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

1

If you have a one-hot dataframe, you already have rows where all columns are 0 and one is 1 like:

>>> df
   A  B  C  D
0  0  1  0  0
1  0  1  0  0
2  0  0  0  1
3  0  0  0  1
4  0  0  1  0
5  0  1  0  0
6  0  1  0  0
7  0  1  0  0
8  1  0  0  0
9  0  0  0  1

So to drop all rows where B is 1 and others are 0, you can do:

df = df[~df['B'].eq(1)]

# OR

df = df[df['B'].eq(0)]

Output:

>>> df
   A  B  C  D
2  0  0  0  1
3  0  0  0  1
4  0  0  1  0
8  1  0  0  0
9  0  0  0  1

Update

If your dataframe contains 0 or 1 but possibly multiple 1 for a same row like:

>>> df
   A  B  C  D
0  0  1  0  0  # HERE
1  0  0  1  1
2  0  0  1  0
3  0  1  0  0  # HERE
4  0  0  1  1
5  1  1  1  1
6  1  0  1  0
7  1  0  1  0
8  0  1  1  0
9  1  0  1  1

You can use:

m1 = df['B'].eq(1)  # check if target column is 1
m2 = df.drop(columns='B').eq(0).all(axis=1)  # check if others are 0
df = df[~(m1 & m2)]

Output:

>>> df
   A  B  C  D
1  0  0  1  1
2  0  0  1  0
4  0  0  1  1
5  1  1  1  1
6  1  0  1  0
7  1  0  1  0
8  0  1  1  0
9  1  0  1  1
Corralien
  • 109,409
  • 8
  • 28
  • 52