I have one-hot encoded a column 'postcode
' and I want to see correlation between that and the wealth_segment which has been label encoded as: ( mass customer = 0, affluent customer = 1 and high net worth customer = 2
).
I want to see if there is a correlation between the postcode and the wealth of the customer. The thing is, i have many columns of postcode because i have one hot encoded it. the naming convention is postcode_XXXX (XXXX being a 4 digit number)
What can I write to only find the correlation between these two variables? I have over 100 other columns in the dataframe so I do not want to simply go with the df.corr()
method.