0

I have a big DataFrame with a date_time as index , for exemple : 2022-09-30 15:45:00. A row with data of one day every minute

row1  2022-09-10 10:05      data1 data2 data3
row2  2022-09-10 10:06      data4 data5 data6    etc...

The index range being : “2022-09-01” to “2022-10-31” (for exemple), it could be several weeks, several months, even a few years (2 or 3)

I want to split this DataFrame in several smaller DataFrames, each one : Containing only data of the same day The name of each new DataFrame must include reference of the date, for exemple XXX-2022-09-15 Each DataFrame has only data (or part of the data) of the same day: from 9:45 to 10:30 (for ex)
The hour range being the same for all the new DataFrames created

I tried with DateTimeIndex but I had problems to understand the way it works

mozway
  • 194,879
  • 13
  • 39
  • 75
GAO
  • 1
  • You input is unclear, what are the column names? What is the index? – mozway Feb 07 '23 at 16:38
  • Thks for your comments, the column names are : temp, high, low, speed and the data are mainly numerical. The index is date_time (2022-09-30 15:45:00 ) from an .csv file – GAO Feb 07 '23 at 17:43

1 Answers1

1

Edits: make the result a dict, and truncate each day

# times to truncate to each day
t0, t1 = pd.Timedelta('09:45:00'), pd.Timedelta('10:30:00')

byday = {
    f'{day:%Y-%m-%d}': d.truncate(before=day+t0, after=day+t1)
    for day, d in df.groupby(pd.Grouper(freq='D'))
}

Example

ix = pd.date_range(
    # note: partial day at the beginning
    '2022-09-10 01:44:00', '2023-01-01', freq='min', inclusive='left')
df = pd.DataFrame(
    np.random.uniform(0,1, (len(ix), 3)), columns=list('abc'),
    index=ix)

# code above

>>> list(byday)[0]
'2022-09-10'

>>> list(byday)[-1]
'2022-12-31'

>>> byday['2022-09-10']
                            a         b         c
2022-09-10 09:45:00  0.247076  0.687310  0.597638
2022-09-10 09:46:00  0.307722  0.753229  0.329068
2022-09-10 09:47:00  0.865848  0.075505  0.268435
...                       ...       ...       ...
2022-09-10 10:28:00  0.383779  0.523062  0.622288
2022-09-10 10:29:00  0.633321  0.105336  0.570100
2022-09-10 10:30:00  0.123475  0.044391  0.802064

>>> byday['2022-12-31']
                            a         b         c
2022-12-31 09:45:00  0.189360  0.812205  0.466228
2022-12-31 09:46:00  0.471459  0.490481  0.903464
2022-12-31 09:47:00  0.279801  0.885283  0.275511
...                       ...       ...       ...
2022-12-31 10:28:00  0.558043  0.692632  0.122300
2022-12-31 10:29:00  0.034136  0.037672  0.020361
2022-12-31 10:30:00  0.205017  0.721944  0.030551
Pierre D
  • 24,012
  • 7
  • 60
  • 96
  • Thanks a lot for your answer, I'll try to implement using your inputs. – GAO Feb 07 '23 at 17:45
  • I did not finish to try it, I'm still working on, I'll advise you when I have some results. – GAO Feb 13 '23 at 10:07
  • I finish to implement it today and it works. Now I have to understand well all the parts of the code you provided me. Have a nice day – GAO Feb 14 '23 at 09:13