Is there an equivalent corr() function for Python Datatable as exists for Python Pandas - to find the correlation matrix of the Frame columns? Thanks
Asked
Active
Viewed 180 times
0
-
1According to [the docs](https://datatable.readthedocs.io/en/latest/changelog/v0.10.0.html#general) a `corr()` function was added in `v0,10,0` in december last year "to compute the covariance and Pearson correlation coefficient between columns of a Frame". Is that what you're looking for? – G. Anderson Feb 10 '20 at 22:33
-
Thanks Anderson. That helps for my current use case. However, what I was looking for is the corr() equivalent from Pandas, where we can do pandas_df.corr() to get the correlation matrix for ALL columns in one go, instead of having to specify each pairwise column. Thanks. – SJain Feb 12 '20 at 17:03
1 Answers
0
One option is to use the following function:
def frame_corr(dt_frame):
numcols = [col for col in dt_frame if col.type.is_numeric]
result = dt.rbind([dt_frame[:, [dt.corr(col1, col2) for col2 in numcols]] for col1 in numcols])
result.names = dt_frame[:,numcols].names
return result
Input Data
data = dt.Frame(x = np.random.normal(size=10),
y = np.random.normal(size=10),
z = np.random.normal(size=10)
)
Output
frame_corr(data)
| x y z
| float64 float64 float64
-- + --------- --------- ---------
0 | 1 -0.880012 0.26132
1 | -0.880012 1 -0.440515
2 | 0.26132 -0.440515 1
[3 rows x 3 columns]
data.to_pandas().corr()
x y z
x 1.000000 -0.880012 0.261320
y -0.880012 1.000000 -0.440515
z 0.261320 -0.440515 1.000000
Note: is_numeric
available in version 1.1.0

langtang
- 22,248
- 1
- 12
- 27