I'm writing a data pre-processor for machine learning, which needs to treat boolean data as categories and not try to see 1 as bigger than 0. After importing a csv table with Pandas DataFrame I want to determine columns which are boolean and cast them to boolean type, without iterating through all numeric columns to do so. Pandas intentionally interprets boolean columns as 'int64' and I haven't found any existing methods to solve this problem.
I've tried numpy array safe casting, but it fails, because instead of checking whether there are any values that don't fit into a boolean, it just refuses to downcast from any type:
import pandas as pd
df = pd.DataFrame({'a':[1, 0, 1]})
numpy_array = df.values
safe_booleans = numpy_array.astype(bool, casting='safe')
Cannot cast array from dtype('int64') to dtype('bool') according to the rule >'safe'
If I remove 'safe' casting, then it works, but I need 'safe' because there are non-boolean columns too which astype would otherwise turn into booleans with loss of data.
Much obliged if you could point me to my mistake or suggest other methods which would turn numeric columns/arrays with only boolean values into boolean type.