2

Here is an exctract of my dataset : Dataset

Here is an exemple of dataset

df = pd.DataFrame(
{'vals': np.where(np.arange(35) < 30, np.arange(35), np.nan)},
index=pd.date_range('2021-01-01', freq='12H', periods=35))

                    vals
2021-01-01 00:00:00 0.0
2021-01-01 12:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 12:00:00 3.0
2021-01-03 00:00:00 4.0
2021-01-03 12:00:00 5.0
2021-01-04 00:00:00 6.0
2021-01-04 12:00:00 7.0
2021-01-05 00:00:00 8.0
2021-01-05 12:00:00 9.0
2021-01-06 00:00:00 10.0
2021-01-06 12:00:00 11.0
2021-01-07 00:00:00 12.0
2021-01-07 12:00:00 13.0
2021-01-08 00:00:00 14.0
2021-01-08 12:00:00 15.0
2021-01-09 00:00:00 16.0
2021-01-09 12:00:00 17.0
2021-01-10 00:00:00 18.0
2021-01-10 12:00:00 19.0
2021-01-11 00:00:00 20.0
2021-01-11 12:00:00 21.0
2021-01-12 00:00:00 22.0
2021-01-12 12:00:00 23.0
2021-01-13 00:00:00 24.0
2021-01-13 12:00:00 25.0
2021-01-14 00:00:00 26.0
2021-01-14 12:00:00 27.0
2021-01-15 00:00:00 28.0
2021-01-15 12:00:00 29.0
2021-01-16 00:00:00 NaN
2021-01-16 12:00:00 NaN
2021-01-17 00:00:00 NaN
2021-01-17 12:00:00 NaN
2021-01-18 00:00:00 NaN

and for the result i would like to have :

                        vals
    2021-01-01 00:00:00 0.0
    2021-01-01 12:00:00 1.0
    2021-01-02 00:00:00 2.0
    2021-01-02 12:00:00 3.0
    2021-01-03 00:00:00 4.0
    2021-01-03 12:00:00 5.0
    2021-01-04 00:00:00 6.0
    2021-01-04 12:00:00 7.0
    2021-01-05 00:00:00 8.0
    2021-01-05 12:00:00 9.0
    2021-01-06 00:00:00 10.0
    2021-01-06 12:00:00 11.0
    2021-01-07 00:00:00 12.0
    2021-01-07 12:00:00 13.0
    2021-01-08 00:00:00 14.0
    2021-01-08 12:00:00 15.0
    2021-01-09 00:00:00 16.0
    2021-01-09 12:00:00 17.0
    2021-01-10 00:00:00 18.0
    2021-01-10 12:00:00 19.0
    2021-01-11 00:00:00 20.0
    2021-01-11 12:00:00 21.0
    2021-01-12 00:00:00 22.0
    2021-01-12 12:00:00 23.0
    2021-01-13 00:00:00 24.0
    2021-01-13 12:00:00 25.0
    2021-01-14 00:00:00 26.0
    2021-01-14 12:00:00 27.0
    2021-01-15 00:00:00 28.0
    2021-01-15 12:00:00 29.0
    2021-01-16 00:00:00 16.0
    2021-01-16 12:00:00 17.0
    2021-01-17 00:00:00 18.0
    2021-01-17 12:00:00 19.0
    2021-01-18 00:00:00 20.0

My question:

I would like to fill NaN value with a value from the same column that has been observed a week before.

df.fillna(method='ffill') is not helping as it fills based on the last value. Any idea?

Thomas LESIEUR
  • 408
  • 4
  • 14
  • 1
    Please take a while to read ["How to create a Minimal, Reproducible Example"](https://stackoverflow.com/help/minimal-reproducible-example) – Corralien Aug 01 '21 at 19:39
  • You mean, for example, if you're missing a value on a Tuesday, you'd like to ffill from the most recent observed Tuesday value, regardless of whether there are other more recent observations? – Michael Delgado Aug 01 '21 at 20:51
  • Thanks Corralien, i will have a look. – Thomas LESIEUR Aug 02 '21 at 09:45
  • Thanks for the question and clarification Michel. Imean, if you're missing a value on a Tuesday at 8:00,I'd like to ffill from the most recent observed Tuesday at 8:00 value, regardless of whether there are other more recent observations. – Thomas LESIEUR Aug 02 '21 at 09:47
  • I edited the initial question. – Thomas LESIEUR Aug 02 '21 at 09:47

1 Answers1

3

Simple example of a DataFrame with missing values and a datetime index:

In [2]: df = pd.DataFrame(
   ...:     {'vals': np.where(np.arange(21) < 14, np.arange(21), np.nan)},    
   ...:     index=pd.date_range('2021-01-01', freq='D', periods=21),
   ...: )
   ...:

In [3]: df
Out[3]:
            vals
2021-01-01   0.0
2021-01-02   1.0
2021-01-03   2.0
2021-01-04   3.0
2021-01-05   4.0
2021-01-06   5.0
2021-01-07   6.0
2021-01-08   7.0
2021-01-09   8.0
2021-01-10   9.0
2021-01-11  10.0
2021-01-12  11.0
2021-01-13  12.0
2021-01-14  13.0
2021-01-15   NaN
2021-01-16   NaN
2021-01-17   NaN
2021-01-18   NaN
2021-01-19   NaN
2021-01-20   NaN
2021-01-21   NaN

You can group on the week day using pandas datetime components, then use ffill to forward fill within each group:

In [4]: df.groupby(df.index.weekday).ffill()
Out[4]:
            vals
2021-01-01   0.0
2021-01-02   1.0
2021-01-03   2.0
2021-01-04   3.0
2021-01-05   4.0
2021-01-06   5.0
2021-01-07   6.0
2021-01-08   7.0
2021-01-09   8.0
2021-01-10   9.0
2021-01-11  10.0
2021-01-12  11.0
2021-01-13  12.0
2021-01-14  13.0
2021-01-15   7.0
2021-01-16   8.0
2021-01-17   9.0
2021-01-18  10.0
2021-01-19  11.0
2021-01-20  12.0
2021-01-21  13.0
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54