1

I have a dataframe that I grouped with function groupby. In order to do so, I had to use DatetimeIndex. However, I would like to transform my datetimeindex as integer to use it as index for aa dynamic optimization model. I'm able to transform my date time index as float by not as integer differenciating hours.


# My data look like this:

[                           Date  Hour  MktDemand   HOEP  hour  
Datetime                                                        
2019-01-01 01:00:00  2019-01-01     1      16231   0.00     0   
2019-01-01 02:00:00  2019-01-01     2      16051   0.00     1   
2019-01-01 03:00:00  2019-01-01     3      15805  -0.11     2   
2019-01-01 04:00:00  2019-01-01     4      15580  -1.84     3   
2019-01-01 05:00:00  2019-01-01     5      15609  -0.47     4   
...



import datetime as dt

df['Datetime'] = pd.to_datetime(df.Date) + pd.to_timedelta(df.Hour, unit='h')
df['datetime'] = pd.to_datetime(df.Date) + pd.to_timedelta(df.Hour, unit='h')
grouped = df.set_index('Datetime').groupby(pd.Grouper(freq="15d"))


for name, group in grouped:
    print(pd.to_numeric(group.index, downcast='integer'))

# It returns this:
Int64Index([1546304400000000000, 1546308000000000000, 1546311600000000000,
            1546315200000000000, 1546318800000000000, 1546322400000000000,
            1546326000000000000, 1546329600000000000, 1546333200000000000,
            1546336800000000000,
              ...

# However, I would like to have integers in this format:

20190523
20190524

# I tried this but it doesn't work:
for name, group in grouped:
    print(pd.to_timedelta(group.index).dt.total_hours().astype(int))


ERROR: dtype datetime64[ns] cannot be converted to timedelta64[ns]

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Philippe
  • 31
  • 4

1 Answers1

1

The integers you expect represent a datetime format; they're not an actual numeric representation of datetime (which pd.to_numeric gives you, as nanoseconds since 1970-1-1 UTC).

Therefore, you'll want to format to string and then convert to integer.

Ex:

import pandas as pd
# some synthetic example data...
dti = pd.date_range("2015", "2016", freq='d')
df = pd.DataFrame({'some_value': [i for i in range(len(dti))]})
grouped = df.set_index(dti).groupby(pd.Grouper(freq="15d"))

for name, group in grouped:
    print(group.index.strftime('%Y%m%d').astype(int))
    
# gives you e.g.
Int64Index([20150101, 20150102, 20150103, 20150104, 20150105, 20150106,
        20150107, 20150108, 20150109, 20150110, 20150111, 20150112,
        20150113, 20150114, 20150115],
       dtype='int64')
...

You could also extend the strftime directive to give you additional parameters like hours or minutes.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • After I create grouped, which is chunk of data from the original dataframe, it returns a tuple which has no attribute index or strftime. Then I cannot use this method. – Philippe Jan 23 '21 at 14:47
  • @Philippe: it should not be all too complicated to adapt my example - e.g. tried `group.index.strftime('%Y%m%d').astype(int)` in your `for name, group in grouped:` loop? – FObersteiner Jan 23 '21 at 16:27
  • It worked! I added ('%Y%m%d%H') and got exactly like in the answer above plus the hour. Thank you! – Philippe Jan 24 '21 at 17:59
  • @Philippe: glad if I could help, make sure to [accept an answer](https://stackoverflow.com/help/someone-answers) as solution if it solved your problem. – FObersteiner Jan 24 '21 at 18:57