0

I have the following df with a lot more number columns. I now want to make a forward filling for all the columns in the dataframe but grouped by id.

id  date  number    number2
1   2001    4         11
1   2002    4         45
1   2003    NaN       13
2   2001    7         NaN
2   2002    8         2

The result should look like this:

id  date  number    number2
1   2001    4         11
1   2002    4         45
1   2003    4         13
2   2001    7         NaN
2   2002    8         2

I tried the following command:

df= df.groupby("id").fillna(method="ffill", limit=2)

However, this raises a KeyError "isin". Filling just one column with the following command works just fine, but how can I efficiently forward fill the whole df grouped by isin?

df["number"]= df.groupby("id")["number"].fillna(method="ffill", limit=2)
freddy888
  • 956
  • 3
  • 18
  • 39

2 Answers2

1

You can use:

df = df.groupby("id").apply(lambda x: x.ffill(limit=2))
print (df)
   id  date  number  number2
0   1  2001     4.0     11.0
1   1  2002     4.0     45.0
2   1  2003     4.0     13.0
3   2  2001     7.0      NaN
4   2  2002     8.0      2.0

Also for me working:

df.groupby("id").fillna(method="ffill", limit=2)

so I think is necessary upgrade pandas.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, I did upgrade pandas and the fillna command works. However, it also drops the id column in the new dataframe. How could I avoid that? – freddy888 Jan 22 '18 at 15:55
  • Interesting, for me it working nice. `.reset_index()` does work? – jezrael Jan 22 '18 at 16:02
  • No, still drops the id column. But the apply function works for me. – freddy888 Jan 22 '18 at 16:07
  • @FriedrichFranz Same here. groupby.fillna drops the id column but groupby.ffill or groupby.bfill seem to not do that. Seems like inconsistent behaviour. – Roobie Nuby Mar 21 '18 at 22:49
0

ffill can be use directly

df.groupby('id').ffill(2)
Out[423]: 
   id  date  number  number2
0   1  2001     4.0     11.0
1   1  2002     4.0     45.0
2   1  2003     4.0     13.0
3   2  2001     7.0      NaN
4   2  2002     8.0      2.0

#isin

#df.loc[:,df.columns.isin([''])]=df.loc[:,df.columns.isin([''])].groupby('id').ffill(2)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Hi @beny, I am trying to use kinda same method for planets dataset for imputation, but it's not imputing all the values. Not sure why: https://stackoverflow.com/questions/73449902/fill-in-missing-values-with-groupby/73450241 – Roy Aug 26 '22 at 19:00