3

I have 4 dataframes with data of similar datetime indexes, however in each of them there are few missing lines and I know that the gaps can be filled using previous known data.

I would like to 'align' these dataframes so that they have union of indexes of all dataframes and to fill in missing values. I know how to do it for 2 dataframes: df1, df2 = df1.align(df2, axis=0, method='pad'), but what is the good way to do it for more than 2?

I have tried this and it does work:

df1 = pd.DataFrame({'values': 1}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-03']))
df2 = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-06-04', '2016-06-07']))
df3 = pd.DataFrame({'values': 3}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-05']))

commonIndex = df1.index.join(df2.index, how='outer').join(df3.index, how='outer')

for d in [df1, df2, df3]:
    d = d.reindex(index=commonIndex, method='pad')

However when I try same approach on my real data, it gives error: "ValueError: index must be monotonic increasing or decreasing". It is price data, and index looks like this:

DatetimeIndex(['2014-03-24 00:00:00', '2014-03-24 00:01:00',
           '2014-03-24 00:02:00', '2014-03-24 00:03:00',
           '2014-03-24 00:04:00', '2014-03-24 00:05:00',
           '2014-03-24 00:06:00', '2014-03-24 00:07:00',
           '2014-03-24 00:08:00', '2014-03-24 00:09:00',
           ...
           '2014-10-10 17:51:00', '2014-10-10 17:52:00',
           '2014-10-10 17:53:00', '2014-10-10 17:54:00',
           '2014-10-10 17:55:00', '2014-10-10 17:56:00',
           '2014-10-10 17:57:00', '2014-10-10 17:58:00',
           '2014-10-10 17:59:00', '2014-10-10 18:00:00'],
          dtype='datetime64[ns]', name=u'datetime', length=139671, freq=None)

As far as I understand it should be increasing. Not sure about 'monotonic', but I assumed that it wasn't a constraint as demonstrated in above example (there were still missing dates).

Any help is greatly appreciated, and forgive me if I'm not using the correct terminology.

P.S. When I iterate through list of dataframes it looks like they are not actually saved in df1, df2, df3 after applying reindexing. How do I fix that?

danilam
  • 75
  • 2
  • 5

1 Answers1

3

Is this the behavior you are trying to achieve? Note that this method works regardless of whether or not the indexes on the dataframes are monotonic.

df1 = pd.DataFrame({'values': 1}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-03']))
df2 = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-06-04', '2016-06-07']))
df3 = pd.DataFrame({'values': 3}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-05']))

df = pd.concat([df1,df2,df3], axis=1).ffill().bfill()
df.columns = ['values1', 'values2', 'values3']
df

Which gives:

          values1  values2  values3
2016-05-04  1.0     2.0     3.0
2016-06-01  1.0     2.0     3.0
2016-06-02  1.0     2.0     3.0
2016-06-03  1.0     2.0     3.0
2016-06-05  1.0     2.0     3.0

Or if you just want the data-frames left separate, this will also work regardless of whether the data-frame has a monotonic index.

commonIndex = df1.index | df2.index | df3.index
df2.reindex(commonIndex).ffill()

EDIT:

I had a snippet here that reproduced your error, but I think it works better as its own question- so take a look here.

Community
  • 1
  • 1
michael_j_ward
  • 4,369
  • 1
  • 24
  • 25
  • does reassigning to the variable in a for loop work like that? – Andy Hayden Jun 23 '16 at 03:43
  • @michael_j_ward I'm pretty sure my real data is also increasing. Is there a way to check for it or even to show when it's not the case? Thanks. – danilam Jun 23 '16 at 03:44
  • also, your code gives an error `ValueError: index must be monotonic increasing or decreasing`. – Andy Hayden Jun 23 '16 at 03:45
  • @AndyHayden, the second snippet SHOULD produce that error., the point was that I was able to reproduce OPs error. Further, it is only reproduced when the `method='pad'` parameter is called. I was hoping to further isolate the problem so that others could provide insight – michael_j_ward Jun 23 '16 at 03:48
  • @danilam, yes, `df.index.is_monotonic`. As you can see in my snippet where I reproduce your error, `df2` is NOT monotonic. – michael_j_ward Jun 23 '16 at 03:49
  • I see. commonIndex is also not used in your first snippet. – Andy Hayden Jun 23 '16 at 03:50
  • @AndyHayden, agreed. Removed that for clarity. – michael_j_ward Jun 23 '16 at 03:51
  • Great! Note: pad == ffill. – Andy Hayden Jun 23 '16 at 03:54
  • Thank you all. @michael_j_ward that's not what I'm trying to achieve. I don't need one df, but want to keep them separate. Just want them to have union of indexes and populate values for new rows with logic similar to pad/ffill. I will have a further look into monotonic indexes, because so far I don't get how they can be both monotonic and not... – danilam Jun 23 '16 at 04:10
  • 1
    @danilam then just put the fill method outside. I'll update in a second – michael_j_ward Jun 23 '16 at 04:13
  • @michael_j_ward Thank you, Michael. It also helped me to find out my data was indeed not sorted (long story, was doing multiple time slicing and then concatting). But it's interesting that unintentionally I stumbled upon buggy behavior. – danilam Jul 04 '16 at 05:31