0

First and foremost, this sounds like a really stupid question, and I think the answer should be simple.

Yet... I'm stuck with this problem for weeks and I'm still not able to solve it.

What I need, is to just use something like fillna(method="ffill") in groups. Considering the DataFrame below:

import pandas as pd

df = pd.read_csv("something.csv"
>>> df
    group date        price
0   1     2021-12-01  .15
1   1     2022-01-15  NaN
2   1     2021-02-03  .35
3   2     2021-12-01  NaN
4   2     2021-12-15  2.5
5   2     2022-02-03  NaN
6   3     2021-11-15  3.25
6   3     2021-12-03  NaN
6   3     2022-01-06  NaN

The desired output is:

>>> df
    group date        price
0   1     2021-12-01  .15
1   1     2022-01-15  .15
2   1     2021-02-03  .35
3   2     2021-12-01  NaN
4   2     2021-12-15  2.5
5   2     2022-02-03  2.5
6   3     2021-11-15  3.25
6   3     2021-12-03  3.25
6   3     2022-01-06  3.25

What I have tried so far:

df["price"] = df.groupby(["group", "date"])["price"].ffill()
df["price"] = df.groupby(["group", "date"])["price"].fillna(method="ffill")
df = df.groupby(["group", "date"]).fillna(method="ffill")
df["price"] = df.groupby(["group", "date"])["price"].apply(lambda x :x.ffill())

And several similar answers I found, but none of them gave the expected result.

Renan Klehm
  • 158
  • 1
  • 1
  • 9

1 Answers1

1

I think you missed the simple:

df['price'] = df.groupby('group')['price'].ffill()

Indeed, you should not group by date as you would only have groups of size 1.

output (with different column for clarity):

   group        date  price  price_ffill
0      1  2021-12-01   0.15         0.15
1      1  2022-01-15    NaN         0.15
2      1  2021-02-03   0.35         0.35
3      2  2021-12-01    NaN          NaN
4      2  2021-12-15   2.50         2.50
5      2  2022-02-03    NaN         2.50
6      3  2021-11-15   3.25         3.25
6      3  2021-12-03    NaN         3.25
6      3  2022-01-06    NaN         3.25
mozway
  • 194,879
  • 13
  • 39
  • 75