2

I have a data frame with cumulative sums for various time periods and I would like to uniformly downsample to the monthly average. Sometimes I have a yearly summary, which I want to downsample to 1/12, quarterly data downsample to 1/3, and monthly data won't be downsampled at all. So if I have the following example

1   2017-12-31  600
2   2018-12-31  1200
3   2109-03-31  330
4   2019-04-30  125
5   2019-05-31  126
6   2019-06-30  127
7   2019-09-30  360
8   2020-01-31  480

I would like to achieve a split of sum(value in the period) / number of downsamples

date    value
2017-12-31  600
2018-01-31  100
2018-02-28  100
2018-03-31  100
2018-04-30  100
2018-05-31  100
2018-06-30  100
2018-07-31  100
2018-08-31  100
2018-09-30  100
2018-10-31  100
2018-11-30  100
2018-12-31  100
2019-01-31  120
2019-02-28  110
2019-03-31  110
2019-04-30  125
2019-05-31  126
2019-06-30  127
2019-07-31  120
2019-08-31  120
2019-09-30  120
2019-10-31  120
2019-11-30  120
2019-12-31  120
2020-01-31  120

Is there an internal pandas function or same simple custom function which can achieve this? I can imagine that I will backfill the values and divide by the size of the resample group, but I didn't find an easy way to do it yet.

Vaasha
  • 881
  • 1
  • 10
  • 19

1 Answers1

1

I have managed to solve this, though via a bit cumbersome approach. I would still appreciate a simple way.

  1. Input df
[In] print(df):
[Out]: 
date    value
2017-12-31  600
2018-12-31  1200
2019-03-31  330
2019-04-30  125
2019-05-31  126
2019-06-30  127
2019-09-30  360
2020-01-31  480
  1. Resample and backfill:
[In] res = df.set_index("date").resample("M").bfill()
[In] print(res)
[Out] 
date       value
2017-12-31 600
2018-01-31 1200
2018-02-28 1200
...
2019-11-30 480
2019-12-31 480
2020-01-31 480
  1. Find you which index dates belong to each group:
[In] 
resampled_groups = df.set_index("date").resample("M").groups
df_groups = pd.DataFrame(resampled_groups, index=["group"]).T.shift(1)
print(df_groups)
[Out]
2017-12-31 Nan
2018-01-31 1
2018-02-28 1
...
2018-12-31 1
2019-01-31 2
...
2019-04-30 3
2019-05-31 4
2019-06-30 5
2019-07-31 6
2019-08-31 6
2019-09-30 6
2019-10-31 7
...
  1. count occurrences in each group
[In]
s = df_groups.groupby("group").size()
s.name = "count"
s_counts = df_groups.join(s, on="group")["count"]
print(s_counts)
[Out]
2017-12-31 Nan
2018-01-31 12
2018-02-28 12
...
2018-12-31 12
2019-01-31 3
...
2019-04-30 1
2019-05-31 1
2019-06-30 1
2019-07-31 3
2019-08-31 3
2019-09-30 3
2019-10-31 4
  1. divide backfilled value with the count:
[In]
res = res.join(s_counts)
res["final_value"] = res["value"]/res["count"]
print(res)
[Out]
date       value count final_value
2017-12-31 600   NaN    NaN
2018-01-31 1200  12     100
2018-02-28 1200  12     100
...
2018-12-31 1200  12     100
2019-01-31 330   3      110
...
2019-04-30 125   1      125
2019-05-31 126   1      126
2019-06-30 127   1      127
2019-07-31 360   3      120
2019-08-31 360   3      120
2019-09-30 360   3      120
2019-10-31 480   4      120
...
2020-01-31 480   4      120
Vaasha
  • 881
  • 1
  • 10
  • 19