In pandas: Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row

Question

I found it really hard to articulate this problem, but it is a similar problem to this one.

For example, let's say I have a table like this:

Label	Value
A	1
B	2
C	Nan
D	Nan
E	Nan
F	12

It needs to look like this:

Label	Value
A	1
B	2
C	3
D	3
E	3
F	3

Used the next available value (12) and divided it by the number of Nan values + 1 (12/4) = 3 and replaced the Nan values AND the original value used to interpolate(12) by 3. It is similar to the previous question but also modifies the original value used for interpolation.

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

test['break'] = np.where(test['Value'].notnull(),1,0)
test['group'] = test['break'].shift().fillna(0).cumsum()
test['Value2'] = test.groupby('group').Value.apply(lambda x: x.fillna( x.max() / len(x)))

for row in range(0,test.shape[0]):
    if test['break'].iloc[row] == 0 and test['break'].iloc[row+1] == 1:
        test.at[row+1, 'Value2'] = test['Value2'].iloc[row]

df.interpolate() is not capable of doing this, and this is what I have so far. It gets the job done, but is not very elegant

Pierre D · Accepted Answer · 2022-12-14T18:44:42.897

Perhaps something like this?

test = pd.DataFrame({'Label': ['A', 'B', 'C', 'D', 'E', 'F','G','H','I'],
                     'Value': [1, 2, None, None, None, 12,None,None,4]})

tr = test.assign(
    g=(~test['Value'].isna())[::-1].cumsum()
).groupby('g')['Value'].transform

df = test.assign(Value=tr('last') / tr('size'))

>>> df
  Label     Value
0     A  1.000000
1     B  2.000000
2     C  3.000000
3     D  3.000000
4     E  3.000000
5     F  3.000000
6     G  1.333333
7     H  1.333333
8     I  1.333333

Explanation

With the .assign(g=...) we make groups of values that end with a non-NaN values and preceded by zero or more NaN:

>>> test.assign(
...     g=(~test['Value'].isna())[::-1].cumsum()
... )

  Label  Value  g
0     A    1.0  4
1     B    2.0  3
2     C    NaN  2
3     D    NaN  2
4     E    NaN  2
5     F   12.0  2
6     G    NaN  1
7     H    NaN  1
8     I    4.0  1

Then, we use .groupby('g')['Values'].transform twice: to get the last() value, and to divide it by the size() of the group.

A far more elegant way than I could ever have managed. Thank you very much! — ppoudel, Jan 03 '23 at 16:27

score 1 · Answer 2 · answered Dec 14 '22 at 22:44

In the similar vain as the very good answer by Pierre D, you could also use the following solution for the purpose of grouping in one go:

test['Value'] = ((test.assign(id = (~test['Value'].isna()).cumsum().mask(test['Value'].isna())
                              .bfill()))
 .groupby('id').transform(lambda x: x.bfill() / len(x)))

  Label     Value
0     A  1.000000
1     B  2.000000
2     C  3.000000
3     D  3.000000
4     E  3.000000
5     F  3.000000
6     G  1.333333
7     H  1.333333
8     I  1.333333

In pandas: Interpolate between two rows such that the sum of interpolated values + second row = old value of the second row

2 Answers2

Explanation