2

how can i convert a float64 type value into datetime type value.

here is the the first five float values from the dataset:

0    41245.0
1    41701.0
2    36361.0
3    36145.0
4    42226.0
Name: product_first_sold_date, dtype: float64

And to convert the float type to datetime type value I wrote this:

from datetime import datetime

pd.to_datetime(y['product_first_sold_date'], format='%m%d%Y.0', errors='coerce')

but as the output I got 'NaT' for all the rows in the dataset:

0       NaT
1       NaT
2       NaT
3       NaT
4       NaT

Name: product_first_sold_date, Length: 19273, dtype: datetime64[ns]

then, this:

print(pd.to_datetime(y.product_first_sold_date, infer_datetime_format=True))

but it shows the same date for all the rows in the dataset

0       1970-01-01 00:00:00.000041245
1       1970-01-01 00:00:00.000041701
2       1970-01-01 00:00:00.000036361
3       1970-01-01 00:00:00.000036145
4       1970-01-01 00:00:00.000042226

and I really can't figure out what's wrong with the code?

i have also tried this:

pd.to_datetime(pd.Series(g.product_first_sold_date).astype(str), format='%d%m%Y.0')

and got this as output I have also change the format = '%Y%m%d.0':

ValueError: time data '41245.0' does not match format '%d%m%Y.0' (match)

it looks like nothing works or may be I just did something wrong, don't know how to fix this.Thanks in advance!

FObersteiner
  • 22,500
  • 8
  • 42
  • 72

1 Answers1

3

I'd assume these floating point values represent dates as Excel handles them internally, i.e. days since 1900-01-01:

enter image description here

To convert this format to Python/pandas datetime, you can do so by setting the appropriate origin and unit:

df['product_first_sold_date'] = pd.to_datetime(df['product_first_sold_date'], 
                                               origin='1899-12-30',
                                               unit='D')

...which gives for the provided example

0   2012-12-02
1   2014-03-03
2   1999-07-20
3   1998-12-16
4   2015-08-10
Name: product_first_sold_date, dtype: datetime64[ns]

Important to note here (see @chux-ReinstateMonica's comment) is that 1900-01-01 is day 1 in Excel, not day zero (which you have to provide as origin). Day zero is 1899-12-30; in case you wonder why it's not 1899-12-31, the explanation is quite interesting, you can find more info here.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • 1
    I suspect an off by [1](https://en.wikipedia.org/wiki/Leap_year_problem#:~:text=Microsoft%20Excel%20has%2C%20since%20its,the%20purpose%20of%20backward%20compatibility.) or 2 (0 is Jan 0). My excel 41245.0 --> `12/2/2012` – chux - Reinstate Monica Sep 25 '20 at 21:25
  • 1
    @chux-ReinstateMonica: you're absolutely right, my fault not cross-checking this in Excel! The origin to use here seems 1899-12-30 - which I've also come across before, just can't remember the context right now ^^ might have been Lotus; as e.g. mentioned [here](https://stackoverflow.com/questions/3963617/why-is-1899-12-30-the-zero-date-in-access-sql-server-instead-of-12-31) – FObersteiner Sep 25 '20 at 22:07