Pandas: groupby forward fill with datetime index

Question

I have a dataset that has two columns: company, and value.
It has a datetime index, which contains duplicates (on the same day, different companies have different values). The values have missing data, so I want to forward fill the missing data with the previous datapoint from the same company.

However, I can't seem to find a good way to do this without running into odd groupby errors, suggesting that I'm doing something wrong.

Toy data:

a = pd.DataFrame({'a': [1, 2, None], 'b': [12,None,14]})
a.index = pd.DatetimeIndex(['2010', '2011', '2012'])  
a = a.unstack() 
a = a.reset_index().set_index('level_1') 
a.columns = ['company', 'value'] 
a.sort_index(inplace=True)

Attempted solutions (didn't work: ValueError: cannot reindex from a duplicate axis):

a.groupby('company').ffill() 
a.groupby('company')['value'].ffill() 
a.groupby('company').fillna(method='ffill')

Hacky solution (that delivers the desired result, but is obviously just an ugly workaround):

a['value'] = a.reset_index().groupby(
    'company').fillna(method='ffill')['value'].values

There is probably a simple and elegant way to do this, how is this performed in Pandas?

score 22 · Answer 1 · answered Jul 26 '16 at 18:28

22

One way is to use the transform function to fill the value column after group by:

import pandas as pd
a['value'] = a.groupby('company')['value'].transform(lambda v: v.ffill())

a
#          company  value
#level_1        
#2010-01-01      a    1.0
#2010-01-01      b   12.0
#2011-01-01      a    2.0
#2011-01-01      b   12.0
#2012-01-01      a    2.0
#2012-01-01      b   14.0

To compare, the original data frame looks like:

#            company    value
#level_1        
#2010-01-01        a      1.0
#2010-01-01        b     12.0
#2011-01-01        a      2.0
#2011-01-01        b      NaN
#2012-01-01        a      NaN
#2012-01-01        b     14.0

answered Jul 26 '16 at 18:28

Psidom

209,562
33
339
356

4

I see no diff in the results between ```a['value'] = a.groupby('company')['value'].transform(lambda v: v.ffill())``` and `a['value'] = a.groupby('company')['value'].ffill()` – Benoit Jun 19 '20 at 05:35
a['value'] = a.groupby('company')['value'].ffill() is much faster – Anup G Prasad Jun 13 '22 at 04:52

score 11 · Answer 2 · answered Jul 26 '16 at 18:35

You can add 'company' to the index, making it unique, and do a simple ffill via groupby:

a = a.set_index('company', append=True)
a = a.groupby(level=1).ffill()

From here, you can use reset_index to revert the index back to the just the date, if necessary. I'd recommend keeping 'company' as part of the the index (or just adding it to the index to begin with), so your index remains unique:

a = a.reset_index(level=1)

piRSquared · Answer 3 · 2016-07-26T19:10:01.107

I like to use stacking and unstacking. In this case, it requires that I append the index with 'company'.

a.set_index('company', append=True).unstack().ffill() \
                                   .stack().reset_index('company')

Timing

Conclusion @Psidom's solution works best under both scenarios.

toy data

bigger toy

np.random.seed([3,1415])
n = 10000
a = pd.DataFrame(np.random.randn(n, 10),
                 pd.date_range('2014-01-01', periods=n, freq='H', name='Time'),
                 pd.Index(list('abcdefghij'), name='company'))

a *= np.random.choice((1, np.nan), (n, 10), p=(.6, .4))

a = a.stack(dropna=False).rename('value').reset_index('company')

Pandas: groupby forward fill with datetime index

3 Answers3

Timing

Linked

Related