4

Can you explain this bizarre behaviour?

df=pd.DataFrame({'year':[1986,1987,1988],'bomb':arange(3)}).set_index('year')

In [9]: df.reindex(arange(1986,1988.125,.125))
Out[9]: 
          bomb
1986.000     0
1986.125   NaN
1986.250   NaN
1986.375   NaN
1986.500   NaN
1986.625   NaN
1986.750   NaN
1986.875   NaN
1987.000     1
1987.125   NaN
1987.250   NaN
1987.375   NaN
1987.500   NaN
1987.625   NaN
1987.750   NaN
1987.875   NaN
1988.000     2

In [10]: df.reindex(arange(1986,1988.1,.1))
Out[10]: 
        bomb
1986.0     0
1986.1   NaN
1986.2   NaN
1986.3   NaN
1986.4   NaN
1986.5   NaN
1986.6   NaN
1986.7   NaN
1986.8   NaN
1986.9   NaN
1987.0   NaN
1987.1   NaN
1987.2   NaN
1987.3   NaN
1987.4   NaN
1987.5   NaN
1987.6   NaN
1987.7   NaN
1987.8   NaN
1987.9   NaN
1988.0   NaN

When the increment is anything other than .125, I find that the new index values do not "find" the old rows that have matching values. ie there is a precision problem that is not being overcome. This is true even if I force the index to be a float before I try to interpolate. What is going on and/or what is the right way to do this? I've been able to get it to work with increment of 0.1 by using

reindex(  np.array(map(round,arange(1985,2010+dt,dt)*10))/10.0 )

By the way, I'm doing this as the first step in linearly interpolating a number of columns (e.g. "bomb" is one of them). If there's a nicer way to do that, I'd happily be set straight.

CPBL
  • 3,783
  • 4
  • 34
  • 44
  • looks like you actually want a datelike index, or do you really for some reason want a float index? what do you want as your final output? – Jeff Jun 27 '13 at 19:31
  • Yes, I guess it's datelike, but I really need no special/clever features except for one decimal of years. The final output? Is this: http://www.youtube.com/watch?v=1BGzzykW_QM&feature=youtu.be ie I have data for several years, and I want to interpolate column values to a (finer) grid in order to animate smoothly. – CPBL Jun 27 '13 at 21:13

2 Answers2

0

You are getting what you ask for. The reindex method only tries to for the data onto the new index that you provide. As mentioned in comments you are probably looking for dates in the index. I guess you were expecting the reindex method to do this though(interpolation):

df2 =df.reindex(arange(1986,1988.125,.125))
pd.Series.interpolate(df2['bomb'])

1986.000    0.000
1986.125    0.125
1986.250    0.250
1986.375    0.375
1986.500    0.500
1986.625    0.625
1986.750    0.750
1986.875    0.875
1987.000    1.000
1987.125    1.125
1987.250    1.250
1987.375    1.375
1987.500    1.500
1987.625    1.625
1987.750    1.750
1987.875    1.875
1988.000    2.000
Name: bomb

the second example you use is inconsistency is probably because of floating point accuracies. Stepping by 0.125 is equal to 1/8 which can be exactly done in binary. stepping by 0.1 is not directly mappable to binary so 1987 is probably out by a fraction.

1987.0 == 1987.0000000001
False
Joop
  • 7,840
  • 9
  • 43
  • 58
  • 1
    Thanks. No, I wasn't expecting reindex to do the interpolate. As I said, ths was the first step / setup for interpolate. "1/8 which can be exactly done in binary" is a main insight I was missing. But I still don't see that I got what I asked for. Especially when the example fails even if the index is a float. – CPBL Jun 27 '13 at 23:00
  • 1
    see here: http://pandas.pydata.org/pandas-docs/dev/indexing.html#fallback-indexing, float indices are almost always a bad idea; since you can't ever exact match all floats you have a problem; either use a datetime like index an int index, or multiple columns or even a string index – Jeff Jun 27 '13 at 23:21
0

I think you are better off doing something like this by using PeriodIndex

In [39]: df=pd.DataFrame({'bomb':np.arange(3)})

In [40]: df
Out[40]: 
   bomb
0     0
1     1
2     2

In [41]: df.index = pd.period_range('1986','1988',freq='Y').asfreq('M')

In [42]: df
Out[42]: 
         bomb
1986-12     0
1987-12     1
1988-12     2

In [43]: df = df.reindex(pd.period_range('1986','1988',freq='M'))

In [44]: df
Out[44]: 
         bomb
1986-01   NaN
1986-02   NaN
1986-03   NaN
1986-04   NaN
1986-05   NaN
1986-06   NaN
1986-07   NaN
1986-08   NaN
1986-09   NaN
1986-10   NaN
1986-11   NaN
1986-12     0
1987-01   NaN
1987-02   NaN
1987-03   NaN
1987-04   NaN
1987-05   NaN
1987-06   NaN
1987-07   NaN
1987-08   NaN
1987-09   NaN
1987-10   NaN
1987-11   NaN
1987-12     1
1988-01   NaN
In [45]: df.iloc[0,0] = -1

In [46]: df['interp'] = df['bomb'].interpolate()

In [47]: df
Out[47]: 
         bomb    interp
1986-01    -1 -1.000000
1986-02   NaN -0.909091
1986-03   NaN -0.818182
1986-04   NaN -0.727273
1986-05   NaN -0.636364
1986-06   NaN -0.545455
1986-07   NaN -0.454545
1986-08   NaN -0.363636
1986-09   NaN -0.272727
1986-10   NaN -0.181818
1986-11   NaN -0.090909
1986-12     0  0.000000
1987-01   NaN  0.083333
1987-02   NaN  0.166667
1987-03   NaN  0.250000
1987-04   NaN  0.333333
1987-05   NaN  0.416667
1987-06   NaN  0.500000
1987-07   NaN  0.583333
1987-08   NaN  0.666667
1987-09   NaN  0.750000
1987-10   NaN  0.833333
1987-11   NaN  0.916667
1987-12     1  1.000000
1988-01   NaN  1.000000
Jeff
  • 125,376
  • 21
  • 220
  • 187