1

I have a database I'm reading from excel as a pandas dataframe, and the dates come in Timestamp dtype, but I need them to be in np.datetime64, so that I can make calculations.

I am aware that the function pd.to_datetime() and the astype(np.datetime64[ns]) method do work. However, I am unable to update my dataframe to yield this datatype, for whatever reason, using the code mentioned above.

I have also tried creating an acessory dataframe from the original one, with just the dates that I wish to update the typing, converting it to np.datetime64 and plugging it back onto the original dataframe:

dfi = df['dates']
dfi = pd.to_datetime(dfi)
df['dates'] = dfi

But still it doesn't work. I have also tried updating values one by one:

arr_i = df.index
for i in range(len(arr_i)):
    df.at[arri[l],'dates'].to_datetime64()

Edit The root problem seems to be that the dtype of the column gets updated to np.datetime64, but somehow, when getting single values from within, they still have the dtype = Timestamp

Does anyone have a suggestion of a workaround that is fairly fast?

  • Why doesn't `df['dates'] = pd.to_datetime(df['dates'])` work? – m13op22 Jul 02 '19 at 18:40
  • Also, what do you mean by doesn't work? What is that output or error message? – m13op22 Jul 02 '19 at 18:41
  • The thing is, they work. There is no error message, the dataframe just does not get updated with the typing. – Engels Leonhardt Jul 02 '19 at 18:50
  • 1
    What dtype do you get for the column? "It doesn't work" doesn't really give us any information when we dont have the data to reproduce your problem. – ALollz Jul 02 '19 at 18:51
  • Is there a difference between column `dtype` and the `dtype` of the data inside? If so, when I print the acessory dataframe, it is an `np.datetime64`, but when I try to get individual values from within the column, I get `Timestamp`. I wil update the information on the question with this. – Engels Leonhardt Jul 02 '19 at 18:58

1 Answers1

1

Pandas tries to standardize all forms of datetimes by storing them as NumPy datetime64[ns] values when you assign them to a DataFrame. But when you try to access individual datetime64 values, they are returned as Timestamps.

There is a way to prevent this automatic conversion from happening however: Wrap the list of values in a Series of dtype object:

import numpy as np
import pandas as pd

# create some dates, merely for example
dates = pd.date_range('2000-1-1', periods=10)
# convert the dates to a *list* of datetime64s
arr = list(dates.to_numpy())
# wrap the values you wish to protect in a Series of dtype object.
ser = pd.Series(arr, dtype='object')

# assignment with `df['datetime64s'] = ser` would also work
df = pd.DataFrame({'timestamps': dates,
                   'datetime64s': ser})

df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 10 entries, 0 to 9
# Data columns (total 2 columns):
# timestamps     10 non-null datetime64[ns]
# datetime64s    10 non-null object
# dtypes: datetime64[ns](1), object(1)
# memory usage: 240.0+ bytes

print(type(df['timestamps'][0]))
# <class 'pandas._libs.tslibs.timestamps.Timestamp'>

print(type(df['datetime64s'][0]))
# <class 'numpy.datetime64'>

But beware! Although with a little work you can circumvent Pandas' automatic conversion mechanism, it may not be wise to do this. First, converting a NumPy array to a list is usually a sign you are doing something wrong, since it is bad for performance. Using object arrays is a bad sign since operations on object arrays are generally much much slower than equivalent operations on arrays of native NumPy dtypes.

You may be looking at an XY problem -- it may be more fruitful to find a way to (1) work with Pandas Timestamps instead of trying to force Pandas to return NumPy datetime64s or (2) work with datetime64 array-likes (e.g. Series of NumPy arrays) instead of handling values individually (which causes the coersion to Timestamps).

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I understand, and it makes sense. I should be able to preprocess my dataset well enough to make calculations with the whole columns, I guess it's the way pandas was supposed to be used. Thank you for the explanation. – Engels Leonhardt Jul 03 '19 at 10:38