1

I am trying to calculate the pearsons correlation for every single possible pair of two columns in a dataframe. I have 57997 columns. But I am getting a memory error.

t_logs = logs.T
print t_logs
results = t_logs.corr(method='pearson').applymap
print results[enter image description here][1]

Here is the trace back

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-99-d0010a131d17> in <module>()
       5 print logs
       6 
 ----> 7 results = t_logs.corr(method='pearson')
       8 print results

C:\Users\nne1s\Anaconda2\lib\site-packages\pandas\core\frame.pyc in 
corr(self, method, min_periods)
   4938 
   4939         if method == 'pearson':
-> 4940             correl = libalgos.nancorr(_ensure_float64(mat), 
minp=min_periods)
4941         elif method == 'spearman':
4942             correl = libalgos.nancorr_spearman(_ensure_float64(mat),

pandas\_libs\algos.pyx in pandas._libs.algos.nancorr 
(pandas\_libs\algos.c:15501)()

MemoryError: 

picture of code linked here

Nneka Ede
  • 49
  • 1
  • 4

1 Answers1

0

I don't think you want to calculate correlation coefficients for 57997 series.

If you're trying to get the correlation matrix for Log2 of Mean, Log2 of STD, etc. transpose the dataframe before you run corr

t_logs = t_logs.T
results = t_logs.corr(method='pearson')

I'm not sure what you wanted to do with applymap here

usernamenotfound
  • 1,540
  • 2
  • 11
  • 18