0

I need help writing a loop that will fill in my missing data.

Suppose I have a pandas data frame like this:

import pandas as pd
import numpy as np
df = pd.DataFrame([['A', '0', 'Apple', 2],
                   ['A', '1', '', 3],
                   ['B', '2', 'Grape', 2],
                   ['B', '3', 'Banana', 1],
                   ['B', '4', np.nan, np.nan],
                   ['B', '5', np.nan, np.nan]
                   ], columns = ['Index1', 'Index2', 'Value1', 'Value2'])

If a 'ValueX' field is blank or displaying nan, I want to fill up the value with a value associated with the previous entry(ies) where previous entry is identified as Index1 is the same, and Index2 is one(or more) less than the current.

For example, for Index1 = 'A' and Index2 = '1', I want to fill with 'Apple' since that's the value associated with A0.
Similarly, I want to fill B4 with Banana, 1 and B5 with Banana, 1.

    Index1  Index2  Value1  Value2
0   A       0       Apple   2.0
1   A       1               3.0
2   B       2       Grape   2.0
3   B       3       Banana  1.0
4   B       4       NaN     NaN
5   B       5       NaN     NaN

will be:

        Index1  Index2  Value1  Value2
    0   A       0       Apple   2.0
    1   A       1       Apple   3.0
    2   B       2       Grape   2.0
    3   B       3       Banana  1.0
    4   B       4       Banana  1.0
    5   B       5       Banana  1.0

The actual dataset actually has roughly 20 'value' fields that I would want to potentially replace.

Thanks in advance for your help.

Farellia
  • 187
  • 1
  • 2
  • 14

1 Answers1

2

IIUC, what you want is:

df.replace('',np.nan).groupby('Index1').ffill()

output:

  Index1 Index2  Value1  Value2
0      A      0   Apple     2.0
1      A      1   Apple     3.0
2      B      2   Grape     2.0
3      B      3  Banana     1.0
4      B      4  Banana     1.0
5      B      5  Banana     1.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thanks - does this mean that I have to order by dataframe by Index1 and Index2 first to get the correct answer? Since the code doesn't seem to take into account index2 at all – Farellia Jul 22 '19 at 18:41
  • Yes, you need to order the dataframe by `Index2`. `groupby` will take care of `Index1`. Besides, you don't fill across `Index1` so sorting on `Index1` is irrelevant anyway. – Quang Hoang Jul 22 '19 at 18:58