-1

I have two panda DataFrames:

Dataframe Yahoo:

date        ticker  return
2017-01-03  CRM     0.018040121229614625
2017-01-03  MSFT    -0.0033444816053511683
2017-01-04  CRM     0.024198086662915008
2017-01-04  MSFT    -0.0028809218950064386
2017-01-05  CRM     -0.0002746875429199269
2017-01-05  MSFT    0.0017687731146487362

Dataframe Quandl:

date        ticker  return
2017-01-03  CRM     0.018040120991250852
2017-01-03  MSFT    -0.003344466975803595
2017-01-04  CRM     0.024198103213211475
2017-01-04  MSFT    -0.0028809268004892363
2017-01-05  CRM     -0.00027464144673694513
2017-01-05  MSFT    0.0017687829680113065

I would like to get the standard deviation for the difference of Yahoo’s and Quandl’s 'return' data calculated across all ticker symbols for each day and data field.

How can I get that?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Alfonso_MA
  • 537
  • 5
  • 26
  • Is standard deviation calculated by grouping all dates for each ticker? Or do you expect just a single number from this process? – Mark Wang Sep 01 '19 at 19:31
  • Subtract one from the other while grouping by ticker (and date?) then get the stddev of the result. Or make a multiindex with date and ticker then subtract ... – wwii Sep 01 '19 at 19:48

1 Answers1

1

Start by merging the data:

  • df1 is Yahoo data
  • df2 is Quandl data
df = pd.merge(df1, df2, on=['date', 'ticker'], suffixes=('_yahoo', '_quandl'))

Create diff:

df['diff'] = df.return_yahoo - df.return_quandl

Output df:

enter image description here

Get std of ticker:

df.groupby('ticker')['diff'].std()

ticker
CRM     2.345815e-08
MSFT    1.018375e-08
Name: diff, dtype: float64
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158