0

If there are three columns of data, the first column is some category id, the second column and the third column have some missing values, I want to aggregate the id of the first column, after grouping, fill in the third column of each group by the method: 'ffill' Missing value

I found a good idea here: Pandas: filling missing values by weighted average in each group! , but it didn't solve my problem because the output it produced was not what I wanted

Enter the following code to get an example:

import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['A','A', 'B','B','B','B', 'C','C','C'],'value': [1, np.nan, np.nan, 2, 3, 1, 3, np.nan, 3],
              'sss':[1, np.nan, 3, np.nan, np.nan, np.nan, 2, np.nan, np.nan]})
Out[13]:
    name    value   sss
0   A      1.0     1.0
1   A      NaN     NaN
2   B      NaN     3.0
3   B      2.0     NaN
4   B      3.0     NaN
5   B      1.0     NaN
6   C      3.0     2.0
7   C      NaN     NaN
8   C      3.0     NaN

Fill in missing values with a previous value after grouping

Then I ran the following code, but it outputs strange results:

df["sss"] = df.groupby("name").transform(lambda x: x.fillna(axis = 0,method = 'ffill'))
df
Out[13]:
    name    value   sss
0   A      1.0     1.0
1   A      NaN     1.0
2   B      NaN     NaN
3   B      2.0     2.0
4   B      3.0     3.0
5   B      1.0     1.0
6   C      3.0     3.0
7   C      NaN     3.0
8   C      3.0     3.0

The result I want is this:

Out[13]:
    name    value   sss
0   A      1.0     1.0
1   A      NaN     1.0
2   B      NaN     3.0
3   B      2.0     3.0
4   B      3.0     3.0
5   B      1.0     3.0
6   C      3.0     2.0
7   C      NaN     2.0
8   C      3.0     2.0

Can someone point out where I am wrong?strong text

罗文浩
  • 79
  • 1
  • 1
  • 8

0 Answers0