python: undetected categorical values

Question

I would like to find out which columns of a dataframe are categorical. This dataframe has indeed column z but my code cannot detect it and prints an empty list. How should I fix it?

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data=[[ 10,10,'a'],
    [ 15,15,'a'],
    [ 14,14,'b']
    ,[16,16,'b'],
    [19,19,'a'],
    [17,17,'a']
    ,[6,6,'c'],
    [5,5,'b'],
    [20,20,'c']
    ,[22,22,'c'],
    [21,21,'b'],
    [18,45 ,'a']]
df = pd.DataFrame(data, columns=['x','y','z'])
categorical_values=[]
for i in df.columns.values.tolist():
    if (type(df[i].all()))==str:
        categorical_values.append(i)

print(categorical_values, 'CATEGORICAL VALUES')
print(len(categorical_values),'total of categorical variables')

Cannot replicate, prints `['z'] CATEGORICAL VALUES` and `1 total of categorical variables` (pandas 1.2.1, numpy 1.19.1) — dm2, Jul 10 '21 at 10:56
Does this answer your question? https://stackoverflow.com/a/65569109/16310106 — , Jul 10 '21 at 11:02
Use (dataframe.column.dtype) to get the type of the column and then compare it with your desired type that you are looking for. — , Jul 10 '21 at 11:04

score 0 · Answer 1 · answered Jul 10 '21 at 11:04

What seems wrong here is your test if (type(df[i].all()))==str, let’s decompose it:

get column i
check if all values of that column are True, see the doc for .all()

Series.all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).
get the return type
check if this type is str or not

You seem to want to check the data types of your columns. For that, use dtypes

>>> df.dtypes
x     int64
y     int64
z    object

You can even select dtypes from the dataframe directly:

>>> df.select_dtypes(include=['object'])
    z
0   a
1   a
2   b
3   b
4   a
5   a
6   c
7   b
8   c
9   c
10  b
11  a
>>> categorical_values = df.select_dtypes(include=['object']).columns.to_list()
>>> categorical_values
['z']

python: undetected categorical values

1 Answers1