-1

I have the following code:

import pandas as pd
from pandas_datareader import data as web

df = web.DataReader('^GSPC', 'yahoo')
df['pct'] = df['Close'].pct_change()

dates_list = df.index[df['pct'].gt(0.002)]

df2 = web.DataReader('^GDAXI', 'yahoo')
df2['pct2'] = df2['Close'].pct_change()

i was trying to run this:

df2.loc[dates_list, 'pct2']

But i keep getting this error:

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported,

I am guessing this is because there are missing data for dates in dates_list. To resolve this:

    idx1 = df.index
    idx2 = df2.index
    missing = idx2.difference(idx1)
    df.drop(missing, inplace = True)
    df2.drop(missing, inplace = True)

However i am still getting the same error. I dont understand why that is.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Slartibartfast
  • 1,058
  • 4
  • 26
  • 60
  • Are there index values in your dates_list that are not in df2? – rhug123 Jul 09 '20 at 00:10
  • Yes there are but I tried to remove them with difference() – Slartibartfast Jul 09 '20 at 00:11
  • Perhaps you could get around this problem by resetting the indexes on both dataframes and filtering the dates using `df.loc[df['dates'].isin(dates_list),'pct2']`. I do not believe the method you are trying will work if there are any index values in the date_list that are not in your df2, and although your fix might solve that problem, it's hard to say without seeing the data. – rhug123 Jul 09 '20 at 00:16

1 Answers1

1

Note that dates_list has been created from df, so it includes some dates present in index there (in df).

Then you read df2 and attempt to retrieve pct2 from rows on just these dates.

But there is a chance that the index in df2 does not contain all dates given in dates_list. And just this is the cause of your exception.

To avoid it, retrieve only rows on dates present in the index. To look for only such "allowed" (narrow down the rows specifidation), you should pass:

dates_list[dates_list.isin(df2.index)]

Run this alone and you will see the "allowed" dates (some dates will be eliminated).

So change the offending instruction to:

df2.loc[dates_list[dates_list.isin(df2.index)], 'pct']
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41