1

I have generated a DatetimeIndex which looks like:

DatetimeIndex(['1970-01-01 09:30:00.015105074',
               '1970-01-01 09:30:00.059901970',
               '1970-01-01 09:30:00.113246707',
               '1970-01-01 09:30:00.113246707',
               '1970-01-01 09:30:00.113246707',
               '1970-01-01 09:30:00.113246707',
               '1970-01-01 09:30:00.113246707',
               '1970-01-01 09:30:00.154178213',
               '1970-01-01 09:30:00.173594287',
               '1970-01-01 09:30:00.202322801',
               ...
               '1970-01-01 15:59:59.544086847',
               '1970-01-01 15:59:59.544121155',
               '1970-01-01 15:59:59.544124809',
               '1970-01-01 15:59:59.544125669',
               '1970-01-01 15:59:59.544126313',
               '1970-01-01 15:59:59.544129843',
               '1970-01-01 15:59:59.544131783',
               '1970-01-01 15:59:59.544132627',
               '1970-01-01 15:59:59.544133264',
               '1970-01-01 15:59:59.871751084'],
              dtype='datetime64[ns]', name=0, length=112673, freq=None)

This has been generated using the code:

GOOG_msg_df = pd.read_csv('GOOG_msg_5.csv', header = None, index_col = 0)
pd.to_datetime(GOOG_msg_df.index, unit = 's')

I wish to extract only the time component (leave the date out). I tried the following:

pd.Series(pd.to_datetime(GOOG_msg_df.index, unit = 's').time)

and I get:

0         09:30:00.015105
1         09:30:00.059901
2         09:30:00.113246
3         09:30:00.113246
4         09:30:00.113246
               ...       
112668    15:59:59.544129
112669    15:59:59.544131
112670    15:59:59.544132
112671    15:59:59.544133
112672    15:59:59.871751
Length: 112673, dtype: object

The issue with this method is that the dtype is object instead of datetime64[ns].

Is there a way to extract only the time component while maintaining the datetime64[ns] dtype? This will allow me to perform operations that rely on this dtype. For e.g. :

pd.to_datetime(GOOG_msg_df.index, unit = 's') > pd.Timestamp('1970-01-01 10:00:00')
>>> array([False, False, False, ...,  True,  True,  True])
QuantCode
  • 23
  • 5
  • Does this answer your question? [pandas: extract date and time from timestamp](https://stackoverflow.com/questions/39662149/pandas-extract-date-and-time-from-timestamp) – Sabsa Jul 21 '22 at 06:44

1 Answers1

0

Create timedeltas instead time column if need later processing it:

s = pd.Series(pd.to_timedelta((pd.to_datetime(GOOG_msg_df.index, unit = 's').time).astype(str)))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I get the following error: AttributeError: 'TimedeltaIndex' object has no attribute 'time' @jezrael – QuantCode Jul 21 '22 at 07:10
  • @cath_py - Can you test `pd.Series(pd.to_timedelta((GOOG_msg_df.index, unit = 's').time).astype(str))` ? – jezrael Jul 21 '22 at 07:11
  • I first get `SyntaxError: invalid syntax` at `=` in `unit = 's'`. I think the correction will be `pd.Series(pd.to_timedelta(GOOG_msg_df.index, unit = 's').time).astype(str)` but despite that I get the same `AttributeError: 'TimedeltaIndex' object has no attribute 'time'`. @jezrael – QuantCode Jul 21 '22 at 07:18
  • @cath_py - can you test now? – jezrael Jul 21 '22 at 07:22
  • 1
    @jerzael - now I get the form `0 days 09:30:00.015105074` with dtype `timedelta64[ns]`. This is better than having Epoch date in original DatetimeIndex. Not sure if it is possible to completely get rid of date / days. – QuantCode Jul 21 '22 at 07:30
  • @cath_py - what is reason for remove `0 days` ? Output format? Because if need processing later it working well. In another words why is problem use `pd.Series(pd.to_datetime(GOOG_msg_df.index, unit = 's').time)` ? – jezrael Jul 21 '22 at 10:08
  • yes, output format is the primary reason (not that big of an issue though) – QuantCode Jul 21 '22 at 11:44