1

Updated:

In this post, the first answer is very close to solve this problem too. However it does not take into account the column A and C.

Pandas Average If in Python : Combining groupby mean with conditional statement


There is a DataFrame with 3 columns. I would like to add 2 new columns which are:

  • the rolling avg of B by A and C (rolling 2 of the current and the previous row which are pass the statement - the same A and C)
  • the rolling avg of B by A and C (rolling 2 of the previous 2 which are pass the statement - the same A and C)

For the second part, I have date and a sequence which could be used as the basic of rolling avg calculation.

Any ideas?

df = pd.DataFrame({'A': ['t1', 't1', 't1', 't1', 't2', 't2', 't2', 't2','t1'],
                   'B': [100, 104, 108, 110, 102, 110, 98, 100, 200],
                   'C': ['h', 'a', 'a', 'a', 'a', 'h', 'h', 'h','h'],
                  'expected1': [100, 104, 106, 109, 102, 110, 104, 99, 150],
                  'expected2': [0, 0, 104, 106, 0, 0, 110, 104, 100]}, columns=['A', 'B', 'C','expected1','expected2'])

df
tomatoma37
  • 31
  • 4

1 Answers1

0

Use lazy group:

grp = df.groupby(['A', 'C'], sort=False)['B']
df['mean'] = grp.transform('mean')
df['mean_avg'] = grp.rolling(2, min_periods=1).mean().values

Output:

>>> df
    A    B  C        mean  mean_avg
0  t1  100  h  100.000000     100.0
1  t1  104  a  107.333333     104.0
2  t1  108  a  107.333333     106.0
3  t1  110  a  107.333333     109.0
4  t2  102  a  102.000000     110.0
5  t2  110  h  102.666667     104.0
6  t2   98  h  102.666667      99.0
7  t2  100  h  102.666667     102.0
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thanks, the mean looks good. but the rolling is not. The row 1 should be 104, row 2 106 row 3 109. Is it possible to change the order of calculation? – tomatoma37 Oct 23 '21 at 15:01
  • 1
    @tomatoma37. I updated my answer. I changed 2 things. The first is `sort=False` to `groupby` to ensure all values are aligned. The second is `min_periods=1` to have a value even if there are not 2 values to compute the mean. I hope I was clear :) – Corralien Oct 23 '21 at 15:12
  • I updated too :) The rolling calculation do not take into account the conditions. I would like to get the rolling-2, but not the last 2 rows but the rows which has t1/t2 and a/h. – tomatoma37 Oct 23 '21 at 15:52