4

I was tinkering around with converting pandas.Timestamp to the built in python datetime. I tried the following:

pandas_time = pd.Timestamp(year=2020, day=4, month=2, hour=0, minute=0, second=1)
pandas_ts = pandas_time.timestamp()
datetime_time = datetime.datetime.fromtimestamp(pandas_ts)
datetime_ts = datetime_time.timestamp()

Looking at the variables gives this:

pandas_time:   2020-02-04 00:00:01
datetime_time: 2020-02-04 01:00:01
pandas_ts:   1580774401.0
datetime_ts: 1580774401.0

So they both have the same timestamp but the dates differ by one hour. When I tried it the other way around, I got this:

datetime_time = datetime.datetime(year=2020, day=4, month=2, hour=0, minute=0, second=1)
datetime_ts = datetime_time.timestamp()
pandas_time = pd.Timestamp(datetime_time)
pandas_ts = pandas_time.timestamp()

pandas_time:   2020-02-04 00:00:01
datetime_time: 2020-02-04 00:00:01
pandas_ts:   1580774401.0
datetime_ts: 1580770801.0

Now the dates are the same but the timestamps differ by 3600 (1 hour). I do know that I could use pandas to_pydatetime for conversion from pandas Timestamp to python datetime but I'm wondering why this difference occurs. Are their starting points defined differently? And if so, why?

debsim
  • 582
  • 4
  • 19
  • 1
    simple but important difference: pandas Timestamp is UTC by default, Python datetime is local time by default. Note that Unix time always (should) refer to UTC, so that's why you observe the difference. – FObersteiner Feb 04 '21 at 16:55

1 Answers1

4

If you look at the datetime documentation it is written for .fromtimestamp(timestamp)

classmethod date.fromtimestamp(timestamp) Return the local date corresponding to the POSIX timestamp, such as is returned by time.time().

So it returns local date. That's it. So you need to tell it to use utc explicitly whereas pandas uses utc by default.

pandas_time = pd.Timestamp(year=2020, month=2, day=4,  hour=0, minute=0, second=1)
pandas_ts = pandas_time.timestamp()
datetime_time = datetime.datetime.fromtimestamp(pandas_ts, tz=timezone.utc)
datetime_ts = datetime_time.timestamp()

Similarly for the second case

datetime_time = datetime.datetime(year=2020, day=4, month=2, hour=0, minute=0, second=1, tzinfo=timezone.utc)
datetime_ts = datetime_time.timestamp()
pandas_time = pd.Timestamp(datetime_time)
pandas_ts = pandas_time.timestamp()

From your question it seems you are living in a UTC+1 country :p

Epsi95
  • 8,832
  • 1
  • 16
  • 34
  • 1
    you should emphasize that pandas takes UTC as default (naive datetime) ;-) that's a pretty drastic difference to taking local time (as Pyhton datetime does). One language, two approaches. If you want my personal opinion: the local-time-thing is just painful. – FObersteiner Feb 04 '21 at 16:57
  • I assumed it would be because of some kind of difference in definition but didn't think of timezones, thanks for the answer – debsim Feb 04 '21 at 17:01