I'm looking for help with the Pandas .corr() method.
As is, I can use the .corr() method to calculate a heatmap of every possible combination of columns:
corr = data.corr()
sns.heatmap(corr)
Which, on my dataframe of 23,000 columns, may terminate near the heat death of the universe.
I can also do the more reasonable correlation between a subset of values
data2 = data[list_of_column_names]
corr = data2.corr(method="pearson")
sns.heatmap(corr)
That gives me something that I can use--here's an example of what that looks like:
What I would like to do is compare a list of 20 columns with the whole dataset. The normal .corr() function can give me a 20x20 or 23,000x23,000 heatmap, but essentially I would like a 20x23,000 heatmap.
How can I add more specificity to my correlations?
Thanks for the help!