3

I'm trying to convert pandas dataframe into SPSS format and have an issue converting a datetime64 variable.

Using the following codes:

import pandas as pd

import datetime as dt

df['date1'] =  (df['date'] - pd.Timestamp('1582-10-15 00:00')).astype('timedelta64[s]')

or

df['date1'] = (df['date'] - dt.datetime(1582, 10, 15)).astype('timedelta64[s]')

I get Out of bounds nanosecond timestamp: 1582-10-15 00:00:00 error.

When I just try for fun to use 1982 instead, it works!

I know there is a hard way to go to epoch from 1582 to 1970 and utc time etc, but is there an easy way? Thank you very much!

ibarant
  • 266
  • 3
  • 15
  • 1
    See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-oob for how to deal with out of bounds timestamps – lanery May 20 '16 at 18:51

2 Answers2

1

I believe Timestamp breaks on such a long ago date because there is no record of how to handle the various leap seconds and what not all along the way. So the error you're getting is the breaking down of the finest precision of time. That makes sense. It's saying that it can't be that precise with a date that long ago.

Solution

Use dt.datetime instead. It doesn't require as much precision.

import pandas as pd

import datetime as dt

epoch = dt.datetime(1582, 10, 15)
date = dt.datetime(2016, 3, 31)

int((date - epoch).total_seconds())
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • df['epoch'] = dt.datetime(1582, 10, 14) / df['date2'] = int((df['date'] - df['epoch']).total_seconds()) gives the same error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1582-10-14 00:00:00 – ibarant May 24 '16 at 16:06
  • The solution I use is: df['DELTA_secs'] = (dt.datetime(1970,1,1)-dt.datetime(1582, 10, 14)).total_seconds() df['date0'] = (df['date']-dt.datetime(1970,1,1)).astype('timedelta64[s]') df['date2'] = df['xxdate0'] + df['DELTA_secs'] df=df.drop(['DELTA_secs','date0'], 1) – ibarant May 24 '16 at 16:09
  • " So the error you're getting is the breaking down of the finest precision of time. " #Epic – kpie May 25 '16 at 00:17
1

Timestamp limitations

A bit of a hack using hints from docs:

df = pd.DataFrame(pd.date_range('2016-01-01', periods=5, freq = 'D'), columns = ['date'])
df
Out[291]: 
        date
0 2016-01-01
1 2016-01-02
2 2016-01-03
3 2016-01-04
4 2016-01-05

# PeriodIndex:
pi = pd.PeriodIndex(df['date'].astype(str), freq='s')
pi
Out[293]: 
PeriodIndex(['2016-01-01 00:00:00', '2016-01-02 00:00:00',
             '2016-01-03 00:00:00', '2016-01-04 00:00:00',
             '2016-01-05 00:00:00'],
            dtype='int64', freq='S')

# Period:
p0 = pd.Period('1582-10-15 00:00', freq='s')
p0
Out[295]: Period('1582-10-15 00:00:00', 'S')

# Then this is an Int64Index (in seconds):
idx = pi - p0
idx
Out[296]: Int64Index([13670899200, 13670985600, 13671072000, 13671158400, 13671244800], dtype='int64')
# idx.values gives a numpy array
ptrj
  • 5,152
  • 18
  • 31
  • Funny to be as precise as seconds over periods of hundreds of years without including a location... https://en.wikipedia.org/wiki/Railway_time – kpie May 25 '16 at 00:21