Pearson Correlation after Normalization

Question

I want to normalize my data and compute a pearson correlation. If I try this without normalization it works. With normalization I get this error message: AttributeError: 'numpy.ndarray' object has no attribute 'corr' What can I do to solve this problem?

import numpy as np
import pandas as pd


filename_train = 'C:\Users\xxx.xxx\workspace\Dataset\!train_data.csv'
names = ['a', 'b', 'c', 'd', 'e', ...]
df_train = pd.read_csv(filename_train, names=names)

from sklearn.preprocessing import Normalizer
normalizeddf_train = Normalizer().fit_transform(df_train)

#pearson correlation
pd.set_option('display.width', 100)
pd.set_option('precision', 2)
print(normalizeddf_train.corr(method='pearson'))

Maybe need create dataframe from numy array - `normalizeddf_train = pd.Dataframe(normalizeddf_train)` — jezrael, Oct 26 '16 at 12:10

score 6 · Accepted Answer · answered Oct 26 '16 at 12:12

6

You need DataFrame constructor, because output of fit_transform is numpy array and work with DataFrame.corr:

df_train = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df_train)
   A  B  C  D  E  F
0  1  4  7  1  5  7
1  2  5  8  3  3  4
2  3  6  9  5  6  3

from sklearn.preprocessing import Normalizer
normalizeddf_train = Normalizer().fit_transform(df_train)
print (normalizeddf_train)
[[ 0.08421519  0.33686077  0.58950634  0.08421519  0.42107596  0.58950634]
 [ 0.1774713   0.44367825  0.70988521  0.26620695  0.26620695  0.3549426 ]
 [ 0.21428571  0.42857143  0.64285714  0.35714286  0.42857143  0.21428571]]

print(pd.DataFrame(normalizeddf_train).corr(method='pearson'))
          0         1         2         3         4         5
0  1.000000  0.917454  0.646946  0.998477 -0.203152 -0.994805
1  0.917454  1.000000  0.896913  0.894111 -0.575930 -0.872187
2  0.646946  0.896913  1.000000  0.603899 -0.878063 -0.565959
3  0.998477  0.894111  0.603899  1.000000 -0.148832 -0.998906
4 -0.203152 -0.575930 -0.878063 -0.148832  1.000000  0.102420
5 -0.994805 -0.872187 -0.565959 -0.998906  0.102420  1.000000

answered Oct 26 '16 at 12:12

jezrael

822,522
95
1,334
1,252

Thanks your your good answer. An other question is: How is it possible to have only a top-3-score of the feature 'F'? So that you can see the top 3 correlations of 'F' on the first view. e.g. `top correlation to F: feature 3: -0.998906 , feature 0: -0.994805, feature 1:-0.872187` – matthew Oct 26 '16 at 14:57
1

I think need`print(pd.DataFrame(normalizeddf_train).corr(method='pearson').nsmallest(3,5))` or `print(pd.DataFrame(normalizeddf_train).corr(method='pearson').nlargest(3,5))` where `3` is number of values and 5 is column name. Check also [`nsmallest`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.nsmallest.html) and [`nlargest`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.nlargest.html). – jezrael Oct 26 '16 at 15:01
The next question is, how can I choose the columns A and D for my further predictive model. I'm asking because I have just indizes but no column names. – matthew Nov 01 '16 at 13:38
2

Hmmm, you can add parameter columns to `Dataframe` constructor like `print(pd.DataFrame(normalizeddf_train, columns = df_train.columns).corr(method='pearson'))` and then get original columns in output. – jezrael Nov 01 '16 at 13:40
1

Doesn't pandas' corr function automatically normalize data linealy? So no need to do it beforehand? – mrbTT Sep 26 '18 at 20:55
@mrbTT - I have no idea, It seems not. – jezrael Sep 27 '18 at 05:05
I believe so because of Nick Cox comment to this question: https://stats.stackexchange.com/questions/125259/normalize-variables-for-calculation-of-correlation-coefficient – mrbTT Sep 27 '18 at 13:56
@mrbTT - I am not statistics guy, so cannot help with this. – jezrael Sep 27 '18 at 13:58
Well me neither, i was just posting here so someone could find it and reply... Even more because googling for what the pandas corr function does I got no results – mrbTT Sep 27 '18 at 14:48

Pearson Correlation after Normalization

1 Answers1