3

I am trying to have a vector of seconds between two time intervals:

import numpy as np
import pandas as pd    
date="2011-01-10"
start=np.datetime64(date+'T09:30:00')
end=np.datetime64(date+'T16:00:00')
range = pd.date_range(start, end, freq='S')

For some reason when I print range I get:

[2011-01-10 17:30:00, ..., 2011-01-11 00:00:00]

So the length is 23401 which is what I want but definitely not the correct time interval. Why is that?

Also, if I have a DataFrame df with a column of datetime64 format that looks like:

Time
15:59:57.887529007
15:59:57.805383290

Once I solved the problem above, will I be able to do the following:

data = df.reindex(df.Time + range) data = data.ffill() ??

I need to do the exact steps proposed here except with datetime64 format. Is it possible?

vvvvv
  • 25,404
  • 19
  • 49
  • 81
Plug4
  • 3,838
  • 9
  • 51
  • 79
  • I get something different when I print range (I see a DatetimeIndex), using pandas 0.13.1. Ah, I see it's a timezone thing. You're in UTC-8, and 14:00 + 8:00 == 00:00. – Andy Hayden Mar 07 '14 at 01:22

2 Answers2

3

It seems that pandas date_range is dropping the timezone (looks like a bug, I think it's already filed...), you can use Timestamp rather than datetime64 to workaround this:

In [11]: start = pd.Timestamp(date+'T09:30:00')

In [12]: end = pd.Timestamp(date+'T16:00:00')

In [13]: pd.date_range(start, end, freq='S')
Out[13]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-10 09:30:00, ..., 2011-01-10 16:00:00]
Length: 23401, Freq: S, Timezone: None

Note: To see it's a timezone, you're in UTC-8, and 14:00 + 8:00 == 00:00 (the next day).

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • thanks! but if I use `pd.Timestamp`, I can't do this: `data = df.reindex(df.Time + range)` after...right? It didn't work. So I have to stick to `datetime64` no? – Plug4 Mar 07 '14 at 01:30
  • @CharlesM you can't add a "time" column and a datetime column / index, you need to make it a timedelta or column of offsets... ? – Andy Hayden Mar 07 '14 at 01:38
  • so I should make `df.Time` other than a `datetime64`? If can't? I have to use `datetime64` since my data is in nanosecond – Plug4 Mar 07 '14 at 03:28
  • 1
    @CharlesM start and end are only there to construct the date_range which is a DatetimeIndex, which if you set to a column you'll see as a datetime64. – Andy Hayden Mar 07 '14 at 04:00
  • perfect I get it! Thank you – Plug4 Mar 07 '14 at 06:38
2

Is it because when you specify the datetime as a string, numpy assumes it is in localtime and converts it to UTC.

Specifying the time offset gives the correct interval though the interval is in UTC

start=np.datetime64(date+'T09:30:00+0000')
end=np.datetime64(date+'T16:00:00+0000')
range=pd.date_range(start,end,freq='S')

Or using a datetime.datetime object as the start and end and again the interval here is in UTC

import datetime
start = datetime.datetime(2011, 1, 10, 9, 30, 0)
end = datetime.datetime(2011, 1, 10, 16, 0, 0)
range=pd.date_range(start,end,freq='S')
  • @tanayamishah thanks! your answer is as good as the one below... I can't check both answers! – Plug4 Mar 07 '14 at 06:44