2

I have the following dataframe called df,

date          flag1 flag2 flag3 flag4…
2020-12-31
2021-01-01                          
2021-01-02                   1
2021-01-03
2021-01-04
2021-01-05            1                
2021-01-06                        1
2021-01-07
2021-01-08
2021-01-09
2021-01-10
2021-01-11     1                  1
2021-01-12 

I want to do a backfill when a 1 appears in any column, and fill backwards until a number appears or failing that, backfill til a set number.

So let’s say the set number to reduce o to is 0 and the decrement is 0.1, it should look like this,

date         flag1  flag2  flag3  flag4…
2020-12-31           0.5    0.8    0.4
2021-01-01   0.0     0.6    0.9    0.5
2021-01-02   0.1     0.7    1.0    0.6
2021-01-03   0.2     0.8           0.7
2021-01-04   0.3     0.9           0.8
2021-01-05   0.4     1.0           0.9
2021-01-06   0.5                   1.0
2021-01-07   0.6                   0.6
2021-01-08   0.7                   0.7
2021-01-09   0.8                   0.8
2021-01-10   0.9                   0.9
2021-01-11   1.0                   1.0
2021-01-12 

Can this be achieved with pandas? I want to be able to set the decrement amount and the limit for example the above would be 0.1 and 0.

I know that this command can increment the values backwards:

df1 = df1[::-1].fillna(method='ffill')
(df1 + (df1 == df1.shift()).cumsum()).sort_index()

But that’s not what I want.

AloneTogether
  • 25,814
  • 5
  • 20
  • 39
anarchy
  • 3,709
  • 2
  • 16
  • 48

2 Answers2

2

You could also try using iloc to change the values based on the indices where the column value is equals to 1.0:

import pandas as pd
import numpy as np

def process_data(c, n):
  for idx in reversed(np.where(c==1)[0]):
    c.iloc[np.arange(idx)[::-1][:n.shape[0]]] = n[idx-1::-1][::-1]
    c.iat[idx] = 1.0
  return c
df = df.apply(lambda r: process_data(r, np.linspace(1.0, 0.0, num=11)[1:]))
             flag1  flag2  flag3  flag4
date                                   
2020-12-31     NaN    0.5    0.8    0.4
2021-01-01     0.0    0.6    0.9    0.5
2021-01-02     0.1    0.7    1.0    0.6
2021-01-03     0.2    0.8    NaN    0.7
2021-01-04     0.3    0.9    NaN    0.8
2021-01-05     0.4    1.0    NaN    0.9
2021-01-06     0.5    NaN    NaN    1.0
2021-01-07     0.6    NaN    NaN    0.6
2021-01-08     0.7    NaN    NaN    0.7
2021-01-09     0.8    NaN    NaN    0.8
2021-01-10     0.9    NaN    NaN    0.9
2021-01-11     1.0    NaN    NaN    1.0
2021-01-12     NaN    NaN    NaN    NaN
AloneTogether
  • 25,814
  • 5
  • 20
  • 39
1

First are created groups by cumulative sums with ffill and limit parameter and then per each group are subtract from right side 10, divide 10 and set NaN if original values is missing:

decr = 0.1
vals = 10

f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
            flag1  flag2  flag3  flag4
date                                  
2020-12-31    NaN    0.5    0.8    0.4
2021-01-01    0.0    0.6    0.9    0.5
2021-01-02    0.1    0.7    1.0    0.6
2021-01-03    0.2    0.8    NaN    0.7
2021-01-04    0.3    0.9    NaN    0.8
2021-01-05    0.4    1.0    NaN    0.9
2021-01-06    0.5    NaN    NaN    1.0
2021-01-07    0.6    NaN    NaN    0.6
2021-01-08    0.7    NaN    NaN    0.7
2021-01-09    0.8    NaN    NaN    0.8
2021-01-10    0.9    NaN    NaN    0.9
2021-01-11    1.0    NaN    NaN    1.0
2021-01-12    NaN    NaN    NaN    NaN

More general solution:

decr = 0.1
start = 1
end = 0

r = np.arange(start, end, -decr)
print (r)
[1.  0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1]

vals = len(r)

f = lambda x: x.groupby(x).cumcount(ascending=False).rsub(vals).mul(decr).where(x.notna())
df1 = df1.where(df1.eq(1)).cumsum()[::-1].ffill(limit=vals)[::-1].apply(f)
print (df1)
            flag1  flag2  flag3  flag4
date                                  
2020-12-31    NaN    0.5    0.8    0.4
2021-01-01    0.0    0.6    0.9    0.5
2021-01-02    0.1    0.7    1.0    0.6
2021-01-03    0.2    0.8    NaN    0.7
2021-01-04    0.3    0.9    NaN    0.8
2021-01-05    0.4    1.0    NaN    0.9
2021-01-06    0.5    NaN    NaN    1.0
2021-01-07    0.6    NaN    NaN    0.6
2021-01-08    0.7    NaN    NaN    0.7
2021-01-09    0.8    NaN    NaN    0.8
2021-01-10    0.9    NaN    NaN    0.9
2021-01-11    1.0    NaN    NaN    1.0
2021-01-12    NaN    NaN    NaN    NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252