0

I am working with data frame in pandas.

import pandas as pd
df = pd.read_csv('Amazon_Historical_StockPrice2.csv',parse_dates=['Date'], index_col='Date')

I need to choose only binary columns for preprocessing. I was trying to make a condition for the value of the columns, to choose columns with values less or equal 1 and greater or equal 0, but this condition doesn't exclude rational numbers in this interval. I would like to ask, is there any option to make it automatically as there are too many columns to make it by hands?

Thanks!

Lidor Eliyahu Shelef
  • 1,299
  • 1
  • 14
  • 35

1 Answers1

0
import numpy as np
import pandas as pd
df = pd.DataFrame({
   "id": [100, 100, 101, 102, 103, 104, 105, 106],
   "A": [1, 2, 3, 4, 5, 2, np.nan, 5],
   "B": [45, 56, 48, 47, 62, 112, 54, 49],
   "C": [1.2, 1.4, 1.1, 1.8, np.nan, 1.4, 1.6, 1.5],
   "Binary": [1, 1, 0, 1, 1, 0, 1, 0]
})

df

    id    A    B    C  Binary
0  100  1.0   45  1.2       1
1  100  2.0   56  1.4       1
2  101  3.0   48  1.1       0
3  102  4.0   47  1.8       1
4  103  5.0   62  NaN       1
5  104  2.0  112  1.4       0
6  105  NaN   54  1.6       1
7  106  5.0   49  1.5       0

Use a list comprehension to get the column name of binary columns:

binary_cols = [col for col in df 
         if np.isin(df[col].dropna().unique(), [0, 1]).all()]

df[binary_cols]

Output:

  Binary
0   1
1   1
2   0
3   1
4   1
5   0
6   1
7   0
ali bakhtiari
  • 1,051
  • 4
  • 23