0

This is my dataframe, I have a multiindex on time and ID.

               +------+-------+----------+-----------+
               | col1 | col2  | col3     | col4      |
+-------+------+------+-------+----------+-----------+
| ID    | t    |      |       |          |           |
+-------+------+------+-------+----------+-----------+
| id1   | t1   | 10   | nan   |    nan   |    1      |
| id1   | t2   | 10   | 110   |      1   |    nan    |
| id1   | t3   | 12   | nan   |    nan   |    nan    |
| id2   | t1   | 12   | 109   |     15   |    1      |
| id2   | t4   | 12   | 109   |    nan   |    1      |
| id2   | t7   | 20   | nan   |    nan   |    nan    |
+-------+------+------+-------+----------+-----------+

Is it possible to do a multi-index fwd fill only on col3 and col4?

               +------+-------+----------+-----------+
               | col1 | col2  | col3     | col4      |
+-------+------+------+-------+----------+-----------+
| ID    | t    |      |       |          |           |
+-------+------+------+-------+----------+-----------+
| id1   | t1   | 10   | nan   |    nan   |     1     |
| id1   | t2   | 10   | 110   |    1     |     1     |
| id1   | t3   | 12   | nan   |    1     |     1     |
| id2   | t1   | 12   | 109   |    15    |     1     |
| id2   | t4   | 12   | 109   |    15    |     1     |
| id2   | t7   | 20   | nan   |    15    |     1     |
+-------+------+------+-------+----------+-----------+

What I've tried so far:

df[['col3','col4']].ffill()  #how to account for the multiindex?
df[['col3','col4']].fillna(df.groupby(['ID','t'])[['col3', 'col4']].ffill()) #did not work
df.reindex(['ID','t'], method='ffill') #this is probably incomplete, and I got 'expected Tuple, got str'
noiivice
  • 400
  • 2
  • 15

2 Answers2

0

I think you were close with the second attempt. I would assume you only want to group by the first level of the multi-index, ID, since you want the column values to fill forward regardless of the second level of the multi-index, t. We can use a groupby on the first level of the multi-index as below:

df1 = df.groupby(level=0)[['col3', 'col4']].ffill()

To join these results back:

df.drop(['col3', 'col4'], axis=1, inplace=True)
df = df.join(df1)
kelvt
  • 949
  • 6
  • 16
0

IIUC use:

df.update(df.groupby(level=0)[['col3', 'col4']].ffill())
print (df)
        col1   col2  col3  col4
ID  t                          
id1 t1    10    NaN   NaN   1.0
    t2    10  110.0   1.0   1.0
    t3    12    NaN   1.0   1.0
id2 t1    12  109.0  15.0   1.0
    t4    12  109.0  15.0   1.0
    t7    20    NaN  15.0   1.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252