How to split a pandas dataframe or series by day (possibly using an iterator)

Question

I have a long time series, eg.

import pandas as pd
index=pd.date_range(start='2012-11-05', end='2012-11-10', freq='1S').tz_localize('Europe/Berlin')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

Now I want to extract all sub-DataFrames for each day, to get the following output:

df_2012-11-05: data frame with all data referring to day 2012-11-05
df_2012-11-06: etc.
df_2012-11-07
df_2012-11-08
df_2012-11-09
df_2012-11-10

What is the most effective way to do this avoiding to check if the index.date==give_date which is very slow. Also, the user does not know a priory the range of days in the frame.

Any hint do do this with an iterator?

My current solution is this, but it is not so elegant and has two issues defined below:

time_zone='Europe/Berlin'
# find all days
a=np.unique(df.index.date) # this can take a lot of time
a.sort()
results=[]
for i in range(len(a)-1):
    day_now=pd.Timestamp(a[i]).tz_localize(time_zone)
    day_next=pd.Timestamp(a[i+1]).tz_localize(time_zone)
    results.append(df[day_now:day_next]) # how to select if I do not want day_next included?

# last day
results.append(df[day_next:])

This approach has the following problems:

a=np.unique(df.index.date) can take a lot of time
df[day_now:day_next] includes the day_next, but I need to exclude it in the range

score 42 · Answer 1 · answered Jun 05 '17 at 06:47

42

If you want to group by date (AKA: year+month+day), then use df.index.date:

result = [group[1] for group in df.groupby(df.index.date)]

As df.index.day will use the day of the month (i.e.: from 1 to 31) for grouping, which could result in undesirable behavior if the input dataframe dates extend to multiple months.

answered Jun 05 '17 at 06:47

Peque

13,638
11
69
105

5

This should be the accepted answer, grouping by days is usually not what you want. – shahar_m Jan 05 '19 at 18:36
1

The accepted anwer is wrong. Please upvote this answer as I can't if you are reading this and you agree – hakeem May 27 '20 at 20:10

Woody Pride · Accepted Answer · 2014-02-06T15:31:08.227

24

Perhaps groupby?

DFList = []
for group in df.groupby(df.index.day):
    DFList.append(group[1])

Should give you a list of data frames where each data frame is one day of data.

Or in one line:

DFList = [group[1] for group in df.groupby(df.index.day)]

Gotta love python!

edited Feb 06 '14 at 15:31

answered Feb 06 '14 at 14:31

Woody Pride

13,539
9
48
62

9

better to group by df.index.date as then we are sure to really pick a different day, using index.day Dec 2 2001 and Jan 2 2002 would be the same day. Unfortunately, using date the computation takes visibly more time. I think what is missing in pandas is fast selection by different time/date categories, this might required to build some a priory index as it is done in databases – Mannaggia Feb 07 '14 at 09:30
2

ok, df.groupby([df.index.year,df.index.month,df.index.day]) makes it 90% faster while making sure to distinguish properly between days – Mannaggia Feb 11 '14 at 12:18
But it makes groups based on the String representation of Date. Not on the basis of Datetime representation of Date. – Rohit Singh Sep 14 '20 at 20:34

How to split a pandas dataframe or series by day (possibly using an iterator)

2 Answers2

Linked

Related