3

I have a dataframe of orders received such as:

 Date        Units
01-01-2020  2
02-01-2020  12
03-01-2020  NaN
04-01-2020  NaN
05-01-2020  5
06-01-2020  9
07-01-2020  NaN
08-01-2020  1

Sometimes customers will order higher numbers so they don't have to reorder the next 1 or 2 days. I am trying to find the average number the customers are using on any given day. For example, if a customer orders 12 and then does not order for 2 days, all three days' units should be changed to 4. Output would be:

 Date        Units
01-01-2020  2
02-01-2020  4
03-01-2020  4
04-01-2020  4
05-01-2020  5
06-01-2020  4.5
07-01-2020  4.5
08-01-2020  1

I could do it with a combination of ffill and adding helper columns to count consecutive NaN values but I feel there must be a more elegant way.

sebwallat
  • 65
  • 3

1 Answers1

1

We can try use isnull with cumsum create the groupby key then do transform

df['Units'] = df['Units'].fillna(0).groupby(df['Units'].notnull().cumsum()).transform('mean')
0    2.0
1    4.0
2    4.0
3    4.0
4    5.0
5    4.5
6    4.5
7    1.0
BENY
  • 317,841
  • 20
  • 164
  • 234