You should compute a frequency table between two categorical variables using pd.crosstab() and perform subsequent analyses based on this table. df.corr(x, y)
is NOT mathematically meaningful when one of x
and y
is categorical, no matter it is encoded into number or not.
N.B.1 If x
is categorical but y
is numerical, there are two options to describe the linkage between them:
- Group
y
into quantiles (bins) and treat it as categorical
- Perform a linear regression of
y
against one-hot encoded dummy variables of x
Option 2 is more precise in general but the statistics is beyond the scope of this question. This post will focus on the case of two categorical variables.
N.B.2 For sparse matrix output please see this post.
Sample Solution
Data & Preprocessing
import pandas as pd
import io
import matplotlib.pyplot as plt
from seaborn import heatmap
df = pd.read_csv(io.StringIO("""
year length Animation
0 1971 121 1,2,3
1 1939 71 1,3
2 1941 7 0,2
3 1996 70 1,2,0
4 1975 71 3,2,0
"""), sep=r"\s{2,}", engine="python")
# convert string to list
df["Animation"] = df["Animation"].str.split(',')
# expand list column into new rows
df = df.explode("Animation")
# (optional)
df["Animation"] = df["Animation"].astype(int)
Frequency Table
Note: grouping of length
is ignored for simplicity
ct = pd.crosstab(df["Animation"], df["length"])
print(ct)
# Out[65]:
# length 7 70 71 121
# Animation
# 0 1 1 1 0
# 1 0 1 1 1
# 2 1 1 1 1
# 3 0 0 2 1
Visualization
ax = heatmap(ct, cmap="viridis",
yticklabels=df["Animation"].drop_duplicates().sort_values(),
xticklabels=df["length"].drop_duplicates().sort_values(),
)
ax.set_title("Title", fontsize=20)
plt.show()

Example Analysis
Based on the frequency table, you can ask questions about the distribution of y
given a certain (subset of) x
value(s), or vice versa. This should better describe the linkage between two categorical variables, as the categorical variables have no order.
For example,
Q: What length does Animation=3 produces?
A: 66.7% chance to give 71
33.3% chance to give 121
otherwise unobserved