-1

Resample Problem:

At the example above, the timeseries dataframe initialize at 09:00, but resample 120T, initialize arbitrarily at 08:00!

Why? Workarounds? Thanks

import pandas as pd
import numpy as np

num_rows = 480
start_date = pd.to_datetime('2023-01-01 09:00')
datetime_range = pd.date_range(start=start_date, periods=num_rows, freq='1min')
random_values = np.random.randint(1, 101, size=num_rows)
df = pd.DataFrame({'datetime': datetime_range, 'value': random_values})

display(df)

df_resampled = df.resample('120T', on='datetime').first().reset_index()

display(df_resampled)

Results:

datetime    value
0   2023-01-01 **09:00:00** 44
1   2023-01-01 09:01:00 38
2   2023-01-01 09:02:00 55
3   2023-01-01 09:03:00 52
4   2023-01-01 09:04:00 49
... ... ...
475 2023-01-01 16:55:00 52
476 2023-01-01 16:56:00 6
477 2023-01-01 16:57:00 46
478 2023-01-01 16:58:00 96
479 2023-01-01 16:59:00 51
480 rows × 2 columns

datetime    value
0   2023-01-01 **08:00:00** 44
1   2023-01-01 10:00:00 76
2   2023-01-01 12:00:00 61
3   2023-01-01 14:00:00 27
4   2023-01-01 16:00:00 68

Thanks

1 Answers1

0

To obtain the desired outcome you need to set origin='start' in df.resample():

df_resampled = df.resample('120T', on='datetime', origin='start').first().reset_index()

print(df_resampled.to_markdown(index=False))

Returns:

| datetime            |   value |
|:--------------------|--------:|
| 2023-01-01 09:00:00 |      40 |
| 2023-01-01 11:00:00 |      34 |
| 2023-01-01 13:00:00 |      29 |
| 2023-01-01 15:00:00 |      55 |
Simon David
  • 663
  • 3
  • 13