0

I have a DataFrame with (several) grouping variables and (several) value variables. My goal is to set the last n non nan values to nan. So let's take a simple example:

df = pd.DataFrame({'id':[1,1,1,2,2,],
                  'value':[1,2,np.nan, 9,8]})
df
Out[1]:
id  value
0   1   1.0
1   1   2.0
2   1   NaN
3   2   9.0
4   2   8.0

The desired result for n=1 would look like the following:

Out[53]:
id  value
0   1   1.0
1   1   NaN
2   1   NaN
3   2   9.0
4   2   NaN
cottontail
  • 10,268
  • 18
  • 50
  • 51
TiTo
  • 833
  • 2
  • 7
  • 28

3 Answers3

2

Use with groupby().cumcount():

N=1
groups = df.loc[df['value'].notna()].groupby('id')
enum = groups.cumcount()
sizes = groups['value'].transform('size')

df['value'] = df['value'].where(enum < sizes - N)

Output:

   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
2

You can check cumsum after groupby get how many notna value per-row

df['value'].where(df['value'].notna().iloc[::-1].groupby(df['id']).cumsum()>1,inplace=True)
df
Out[86]: 
   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
1

One option: create a reversed cumcount on the non-NA values:

N = 1
m = (df
 .loc[df['value'].notna()]
 .groupby('id')
 .cumcount(ascending=False)
 .lt(N)
)
df.loc[m[m].index, 'value'] = np.nan

Similar approach with boolean masking:

m = df['value'].notna()
df['value'] = df['value'].mask(m[::-1].groupby(df['id']).cumsum().le(N))

output:

   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN
mozway
  • 194,879
  • 13
  • 39
  • 75