4

I have a DataFrame df like this:

                                  t        pos
frame
0     2015-11-21 14:46:32.843517000   0.000000
1                               NaT   0.000000
2                               NaT   0.000000
3                               NaT   0.000000
4                               NaT   0.000000
5                               NaT   0.000000
6                               NaT   0.000000
7                               NaT   0.000000
8                               NaT   0.000000
9                               NaT   0.000000
10                              NaT   0.000000
11                              NaT   0.000000
12                              NaT   0.000000
13                              NaT   0.000000
14                              NaT   0.000000
15                              NaT   0.000000
16                              NaT   0.000000
17                              NaT   0.000000
18                              NaT   0.000000
19                              NaT   0.000000
...                             ...        ...
304   2015-11-21 14:46:54.255383750  12.951807
305   2015-11-21 14:46:54.312271250   5.421687
306   2015-11-21 14:46:54.343288000   3.614458
307   2015-11-21 14:46:54.445307000   1.204819
308   2015-11-21 14:46:54.477091000   0.000000
309                             NaT   0.000000
310                             NaT   0.000000
311                             NaT   0.000000
312                             NaT   0.000000
313                             NaT   0.000000
314   2015-11-21 14:46:54.927361000   1.204819
315   2015-11-21 14:46:55.003917250   4.819277
316   2015-11-21 14:46:55.058081500  12.048193
317   2015-11-21 14:46:55.112070500  24.698795
318   2015-11-21 14:46:55.167366000  34.538153
319   2015-11-21 14:46:55.252116750  29.718876
320   2015-11-21 14:46:55.325177750  16.064257
321   2015-11-21 14:46:55.396772000   6.927711
322   2015-11-21 14:46:55.448250000   3.614458
323   2015-11-21 14:46:55.559872500   0.602410

I would like to fill NaT with pandas.tslib.Timestamp.

I found http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.fillna.html

but I can't find a method for this.

But there is probably a workaround.

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
scls
  • 16,591
  • 10
  • 44
  • 55
  • Are you after [`interpolate`](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.interpolate.html#pandas.Series.interpolate)? – EdChum Nov 25 '15 at 16:37
  • `df['t'].interpolate()` doesn't seems to work with `pandas.tslib.Timestamp` try this : `s = pd.Series(pd.date_range('2015-01-01' , '2015-01-10'))` `s[3], s[4], s[5] = pd.NaT, pd.NaT, pd.NaT s.interpolate()` – scls Nov 25 '15 at 16:44

1 Answers1

5

You are right about interpolate method currently not working with Timestamp. One solution is to convert it to float, interpolate it and covert it back to Timestamp:

In [63]:

print df
   pos                             t
0    0 2015-11-21 14:46:54.445307000
1    1 2015-11-21 14:46:54.477091000
2    2                           NaT
3    3                           NaT
4    4                           NaT
5    5                           NaT
6    6 2015-11-21 14:46:54.927361000
7    7 2015-11-21 14:46:55.003917250
In [64]:

pd.to_datetime(pd.to_numeric(df.t).interpolate())
Out[64]:
0   2015-11-21 14:46:54.445306880
1   2015-11-21 14:46:54.477091072
2   2015-11-21 14:46:54.567144960
3   2015-11-21 14:46:54.657199104
4   2015-11-21 14:46:54.747252992
5   2015-11-21 14:46:54.837307136
6   2015-11-21 14:46:54.927361024
7   2015-11-21 14:46:55.003917312
Name: t, dtype: datetime64[ns]
In [65]:

print df
df.ix[df.t.isnull(), 't'] = pd.to_datetime(pd.to_numeric(df.t).interpolate())[df.t.isnull()]
print df
   pos                             t
0    0 2015-11-21 14:46:54.445307000
1    1 2015-11-21 14:46:54.477091000
2    2 2015-11-21 14:46:54.567144960
3    3 2015-11-21 14:46:54.657199104
4    4 2015-11-21 14:46:54.747252992
5    5 2015-11-21 14:46:54.837307136
6    6 2015-11-21 14:46:54.927361000
7    7 2015-11-21 14:46:55.003917250

However, note that due to precision lost (I guess that might be the reason), the numbers are a little bit off (by plus minus ~1e-6 seconds). It might be wise to only fill the nan with the interpolated values and leave the non-nans the way they are.

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • Do you think that I should open an issue ? – scls Nov 25 '15 at 17:29
  • Yes, why not. It is also something easy to implement. The only thing is that the interpolated values are ~1e-6 off. That might matter for some applications, say space probes? – CT Zhu Nov 25 '15 at 17:32
  • 1
    Issue opened https://github.com/pydata/pandas/issues/11701 Feel free to help by sending PR – scls Nov 25 '15 at 17:41
  • 6
    This currently does not work, the NaTs get converted to some very large negative value in `to_numeric`, so that there are no NaNs to interpolate over left. I had to use `numeric = pd.to_numeric(df.t)` `numeric[numeric<0] = numpy.nan` and then interpolate over that to make it work. – RunOrVeith Mar 11 '19 at 14:57