0

I am trying to merge a quarterly series and a monthly series, and in the process essentially "downsampling" the quarterly series. Both dataframes contain a DATE column, BANK, and the remaining columns are various values either in a monthly or quarterly format. The complication I have had is that it is a multiindex, so if I try:

merged_data=df1.join(df2).reset_index(['DATE', 'BANK_CODE']).ffill() 

the forward fill for quarterly data up to the last monthly datapoint is not done for each respective bank as I intended. Could anyone help with this please? Note: I have also tried to resample the quarterly dataframe separately, however I do not know of a way to downsample it to a monthly level until a certain date (should be the latest date in the monthly data).

df2 = df2.set_index(['DATE']).groupby(['BANK']).resample('M')['VALUE'].ffill()

df1:
Date       Bank    Value1  Value2
2021-06-30 bank 1  2000    7000
2021-07-31 bank 1  3000    2000
2021-06-30 bank 2  6000    9000

df2: 
Date       Bank    Value1  Value2
2021-06-30 bank 1  2000    5000
2021-09-30 bank 1  5000    4000
2021-06-30 bank 2  9000    10000

HERE IS A MINI EXAMPLE

RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26
  • Will you show before data and expected output data? – Golden Lion Dec 29 '21 at 22:50
  • Golden Lion, the before data is shown above (two dataframes). I am new to this platform and could not even show it properly until RJ helped me with that (thank you RJ). The expected output is the quarterly data to be on a monthly frequency, with the gaps being filled by the values. Hence why I tried resample to monthly and a forward fill in the code above. – Michael Lee Dec 30 '21 at 03:07

1 Answers1

0

Using the data provided, assuming df1 is monthly and df2 is quarterly.

Set index and resample your quarterly data to monthly:

# monthly data
x1 = df1.set_index(['Bank','Date'])
# quarterly data, resampling back to monthly
x2 = ( df2.set_index('Date')
   .groupby('Bank')
   .resample('M')
   .ffill()
   .drop(columns='Bank')
)

Merge both - I assume you want the product, not the union:

x1.join(x2, lsuffix='_m', rsuffix='_q', how='outer').fillna(0)

                   Value1_m  Value2_m  Value1_q  Value2_q
Bank   Date                                              
bank 1 2021-06-30    2000.0    7000.0      2000      5000
       2021-07-31    3000.0    2000.0      2000      5000
       2021-08-31       0.0       0.0      2000      5000
       2021-09-30       0.0       0.0      5000      4000
bank 2 2021-06-30    6000.0    9000.0      9000     10000

The _m suffices are the values from df1, _q are from df2. I'm assuming you'll know how to explain or deal with the differences between monthly and quarterly values on the same dates.

As you can see, no need to specify the interval, this is provided automatically.

kleynjan
  • 108
  • 5