0

I have tried many things and cannot seem to get this to work. In essence, I want to do this because an error occurs when I'm trying to convert this ndarray to a DataFrame. The following error occurs when finding missing Datetime64 values within the Dataframe:

"Out of bounds nanosecond timestamp: 1-01-01 00:00:00"

Therefore I wish to convert these DateTime64 columns into Strings and Recode '1-01-01 00:00:00' within the ndarray, then convert them back to DateTime variables in a DataFrame in order to avoid facing the error shown above.

with sRW.SavReaderNp('C:/Users/Sam/Downloads/data.sav') as reader:
record = reader.all()

prints:

[(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000',
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000',
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '0001-01-01T00:00:00.000000',)]

1 Answers1

0

First of all please check if your post is valid, i.e. contains runnable code. Your example returns a syntax error and the code where you tried what you explained is simply not there.


However, I assume your data looks like

arr = [(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000'),
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '2019-08-05T00:00:00.000000'),
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', 250000., '0001-01-01T00:00:00.000000')]

which looks converted to a dataframe like

df = pd.DataFrame(arr, columns=['ID', 'value', 'date'])

#                                         ID  ...                        date
# 0  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  ...  2019-08-05T00:00:00.000000
# 1  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  ...  2019-08-05T00:00:00.000000
# 2  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  ...  0001-01-01T00:00:00.000000

Then your attempt to convert the date strings into datetime objects was probably

df.date = pd.to_datetime(df.date)

# OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

which results in the error message you posted in your question.

You can catch these parsing errors with the errors kwarg of pd.to_datetime:

df.date = pd.to_datetime(df.date, 'coerce')

#                                         ID     value       date
# 0  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  250000.0 2019-08-05
# 1  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  250000.0 2019-08-05
# 2  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  250000.0        NaT
SpghttCd
  • 10,510
  • 2
  • 20
  • 25
  • Not quite, the error occurs when I execute df = pd.DataFrame(record). So I’m trying to change the dtype before it’s converted to the Dataframe format, if that makes sense? –  Oct 10 '19 at 06:57
  • Then please don't let us guess what you do - [ask] / [mcve] – SpghttCd Oct 10 '19 at 06:58