I have started working in a company and we are using a lot of Data tables most of which don't contain a description of columns, and in case a column is categorical most definitions of categories are not defined. I came with a solution to send a list of categorical columns and categories to business partners and ask them to fill the category meanings.
But can someone help in finding out which of the columns are categorical as I cannot do it manually because there are more than 20 tables with 70-80 columns in each?
Some solutions I could think of is:
- Checking distribution.
- Ratio of unique values to a total size greater than a threshold then numerical.
Does someone have any other ideas?