3

In answering this stackoverflow question, I found some interesting behavior when using a fill method while reindexing a dataframe.

This old bug report in pandas says that df.reindex(newIndex,method='ffill') should be equivalent to df.reindex(newIndex).ffill(), but that is NOT the behavior I'm witnessing

Here's a code snippet that illustrates the behavior

df = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-05-04', '2016-06-03']))
newIndex = pd.DatetimeIndex(['2016-05-04', '2016-06-01', '2016-06-02', '2016-06-03', '2016-06-05'])
print(df.reindex(newIndex).ffill())
print(df.reindex(newIndex, method='ffill'))

The first print statement works as expected. The second raises a

ValueError: index must be monotonic increasing or decreasing

What's going on here?


EDIT: Note that the sample df intentionally has a non-monotonic index. The question pertains to the order of operations in df.reindex(newIndex, method='ffil'). My expectation is as the bug-report says it should work- first reindex with the new index and then fill.

As you can see, the newIndex.is_monotonic is True, and the fill works when called separately but fails when called as a parameter to reindex.

Community
  • 1
  • 1
michael_j_ward
  • 4,369
  • 1
  • 24
  • 25

2 Answers2

3

Some element of reindex requires the incoming index to be sorted. I'm deducing that when method is passed, it fails to presort the incoming index and subsequently fails. I'm drawing this conclusion based on the fact that this works:

print df.sort_index().reindex(newIndex.sort_values(), method='ffill')
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • This is indeed the issue the stack trace:`1940 indexer = self.get_indexer(target) 1941 nonexact = (indexer == -1) -> 1942 indexer[nonexact] = self._searchsorted_monotonic(target[nonexact], side) 1943 if side == 'left': 1944 # searchsorted returns "indices into a sorted array such that, ` shows that the index must be sorted in order for this to work, this makes sense as you can't `ffill` if the index isn't sorted – EdChum Jun 23 '16 at 08:15
  • @piRSquared it was late when I wrote this question last night. note that `all(newIndex.sort_values()==newIndex)` is `True`. The piece of your snippet that makes it work is the df.sort_index()` call. My example `df` intentionally has a non-monotonic index. My expectation of how `reindex(newIndex, method='ffill')` would be to FIRST reindex and then fill, and not the other way around. – michael_j_ward Jun 23 '16 at 11:26
  • @EdChum, I agree `ffill` on a non-monotonic index doesn't make sense. But my `newIndex` IS monotonic. My expectation on `df.reindex(newIndex, method='ffill')` would be to first reindex with `newIndex` and then fill. But that is clearly not what is happening. – michael_j_ward Jun 23 '16 at 11:27
3

It seems that this needs to be done on the columns as well.

In[76]: frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'],columns=['Ohio', 'Texas', 'California'])

In[77]: frame.reindex(index=['a','b','c','d'],method='ffill',columns=states)
---> ValueError: index must be monotonic increasing or decreasing

In[78]: frame.reindex(index=['a','b','c','d'],method='ffill',columns=states.sort())

Out[78]:
  Ohio  Texas  California
a     0      1           2
b     0      1           2
c     3      4           5
d     6      7           8

Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
Eric
  • 31
  • 1