0

Suppose I have the following two dataframes:

np.random.seed(1)
annual = pd.DataFrame(data=np.random.random((2, 4)), index=index, columns=pd.period_range(start="2015", end="2018", freq="Y"))
quarterly = pd.DataFrame(data=np.random.random((2,3)), index=index, columns=pd.period_range('2019', freq='Q', periods=3))

Annual:

    2015        2016        2017        2018
A   0.417022    0.720324    0.000114    0.302333
B   0.146756    0.092339    0.186260    0.345561

Quarterly:

    2019Q1      2019Q2      2019Q3
A   0.396767    0.538817    0.419195
B   0.685220    0.204452    0.878117

Is it possible that I combine the two dataframes so that the resultant dataframe df looks like something below? If not, are there workarounds that allow me to merge the two dataframes so that I can do something like df['2019Q2'] - df['2018']?

    2015        2016        2017        2018        2019Q1      2019Q2      2019Q3
A   0.417022    0.720324    0.000114    0.302333    0.396767    0.538817    0.419195   
B   0.146756    0.092339    0.186260    0.345561    0.685220    0.204452    0.878117
Jonas
  • 534
  • 8
  • 16

1 Answers1

1

First concat with axis=1 and then necessary convert column names to strings if need processing later:

df = pd.concat([annual,quarterly], axis=1).rename(columns=str)
print (df)
       2015      2016      2017      2018    2019Q1    2019Q2    2019Q3
A  0.417022  0.720324  0.000114  0.302333  0.396767  0.538817  0.419195
B  0.146756  0.092339  0.186260  0.345561  0.685220  0.204452  0.878117

print (df.columns)
Index(['2015', '2016', '2017', '2018', '2019Q1', '2019Q2', '2019Q3'], dtype='object')

print (df['2019Q2'] - df['2018'])
A    0.236484
B   -0.141108
dtype: float64

If want working with Periods, it is possible, but more complicated:

df = pd.concat([annual,quarterly], axis=1)
print (df)
       2015      2016      2017      2018    2019Q1    2019Q2    2019Q3
A  0.417022  0.720324  0.000114  0.302333  0.396767  0.538817  0.419195
B  0.146756  0.092339  0.186260  0.345561  0.685220  0.204452  0.878117

print (df[pd.Period('2018', freq='A-DEC')])
A    0.302333
B    0.345561
Name: 2018, dtype: float64

print (df[pd.Period('2019Q2', freq='Q-DEC')])
A    0.538817
B    0.204452
Name: 2019Q2, dtype: float64

print (df[pd.Period('2019Q2', freq='Q-DEC')] - 
       df[pd.Period('2018', freq='A-DEC')])

IncompatibleFrequency: Input has different freq=A-DEC from Period(freq=Q-DEC)

Change name of Series for prevent error:

print (df[pd.Period('2019Q2', freq='Q-DEC')].rename('a') - 
       df[pd.Period('2018', freq='A-DEC')].rename('a'))

A    0.236484
B   -0.141108
Name: a, dtype: float64

In my opinion if need processing values latter with Periods the best is working with same frequency:

annual.columns = annual.columns.to_timestamp('Q').to_period('Q')
df = pd.concat([annual,quarterly], axis=1)
print (df)
     2015Q1    2016Q1    2017Q1    2018Q1    2019Q1    2019Q2    2019Q3
A  0.417022  0.720324  0.000114  0.302333  0.396767  0.538817  0.419195
B  0.146756  0.092339  0.186260  0.345561  0.685220  0.204452  0.878117

print (df[pd.Period('2019Q2', freq='Q-DEC')] - 
       df[pd.Period('2018Q1', freq='Q-DEC')])

A    0.236484
B   -0.141108
dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252