0

I have started working in a company and we are using a lot of Data tables most of which don't contain a description of columns, and in case a column is categorical most definitions of categories are not defined. I came with a solution to send a list of categorical columns and categories to business partners and ask them to fill the category meanings.

But can someone help in finding out which of the columns are categorical as I cannot do it manually because there are more than 20 tables with 70-80 columns in each?

Some solutions I could think of is:

  1. Checking distribution.
  2. Ratio of unique values to a total size greater than a threshold then numerical.

Does someone have any other ideas?

bazinga
  • 2,120
  • 4
  • 21
  • 35

1 Answers1

0

Considering your dataframe is df, you can do:

df.dtypes

which will give the columns types for your dataframe.

razimbres
  • 4,715
  • 5
  • 23
  • 50
  • What if a column just contains numbers but is actually categorical. Like (0,1) means male and female – bazinga Feb 27 '19 at 13:35
  • You can make a search like [np.unique(df.iloc[:,i]) for i in range(0,df.shape[1])] to check the number of classes. Then, decide what to do. – razimbres Feb 27 '19 at 13:42
  • Better than that: [len(np.unique(df.iloc[:,i])) for i in range(0,df.shape[1])] – razimbres Feb 27 '19 at 13:54