Standard deviation for the difference of two dataframes with group by

Question

I have two panda DataFrames:

Dataframe Yahoo:

date        ticker  return
2017-01-03  CRM     0.018040121229614625
2017-01-03  MSFT    -0.0033444816053511683
2017-01-04  CRM     0.024198086662915008
2017-01-04  MSFT    -0.0028809218950064386
2017-01-05  CRM     -0.0002746875429199269
2017-01-05  MSFT    0.0017687731146487362

Dataframe Quandl:

date        ticker  return
2017-01-03  CRM     0.018040120991250852
2017-01-03  MSFT    -0.003344466975803595
2017-01-04  CRM     0.024198103213211475
2017-01-04  MSFT    -0.0028809268004892363
2017-01-05  CRM     -0.00027464144673694513
2017-01-05  MSFT    0.0017687829680113065

I would like to get the standard deviation for the difference of Yahoo’s and Quandl’s 'return' data calculated across all ticker symbols for each day and data field.

How can I get that?

Is standard deviation calculated by grouping all dates for each ticker? Or do you expect just a single number from this process? — Mark Wang, Sep 01 '19 at 19:31
Subtract one from the other while grouping by ticker (and date?) then get the stddev of the result. Or make a multiindex with date and ticker then subtract ... — wwii, Sep 01 '19 at 19:48

Trenton McKinney · Accepted Answer · 2019-09-01T19:59:55.950

1

Start by merging the data:

df1 is Yahoo data
df2 is Quandl data

df = pd.merge(df1, df2, on=['date', 'ticker'], suffixes=('_yahoo', '_quandl'))

Create `diff`:

df['diff'] = df.return_yahoo - df.return_quandl

Output `df`:

Get `std` of `ticker`:

df.groupby('ticker')['diff'].std()

ticker
CRM     2.345815e-08
MSFT    1.018375e-08
Name: diff, dtype: float64

edited Sep 01 '19 at 19:59

answered Sep 01 '19 at 19:53

Trenton McKinney

56,955
33
144
158

Standard deviation for the difference of two dataframes with group by

1 Answers1

Start by merging the data:

Create diff:

Output df:

Get std of ticker:

Create `diff`:

Output `df`:

Get `std` of `ticker`: