1

I am trying to correlate the same column in two different dataframes (same size). The dfs use stock data with a datetimeindex. Every possible correlation I can come up with only gives NaN for an answer. Is the dtype of the df indees messing things up? Note: at this point in the program, I don't care what the dates / index actually are.

input:

import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like # temp fix

import numpy as np
import fix_yahoo_finance as yf

from pandas_datareader import data, wb
from datetime import date

df1 = yf.download('IBM', start = date (2000, 1, 3), end = date (2000, 1, 5), progress = False)
df2 = yf.download('IBM', start = date (2000, 1, 6), end = date (2000, 1, 10), progress = False)

print (df1)
print (df2)
print (df1['Open'].corr(df2['Open'])) 

output:

                Open    High      Low     Close  Adj Close    Volume
Date                                                                
2000-01-03  112.4375  116.00  111.875  116.0000  81.096031  10347700
2000-01-04  114.0000  114.50  110.875  112.0625  78.343300   8227800
2000-01-05  112.9375  119.75  112.125  116.0000  81.096031  12733200
              Open      High      Low  Close  Adj Close    Volume
Date                                                             
2000-01-06  118.00  118.9375  113.500  114.0  79.697784   7971900
2000-01-07  117.25  117.9375  110.625  113.5  79.348267  11856700
2000-01-10  117.25  119.3750  115.375  118.0  82.494217   8540500
nan
bud fox
  • 335
  • 3
  • 16

1 Answers1

0

The indexes are not matching, that's why you get nan I believe. Use numpy.corrcoef on the raw values to get your result:

np.corrcoef(df1['Open'].values,df2['Open'].values)

Output

[[ 1.         -0.74615579]
 [-0.74615579  1.        ]]
Yuca
  • 6,010
  • 3
  • 22
  • 42
  • thanks yuca, that worked perfectly. can you tell me how to pull out the "-0.746..." number from the matrix, that is all I need. – bud fox Sep 10 '18 at 13:34
  • sure, either do `np.corrcoef(df1['Open'].values,df2['Open'].values)[0, 1]` or `np.corrcoef(df1['Open'].values,df2['Open'].values)[1, 0]` – Yuca Sep 10 '18 at 13:37