-1

With Pandas I have created a DataFrame from an imported .csv file (this file is generated through simulation). The DataFrame consists of half-hourly energy consumption data for a single year. I have already created a DateTimeindex for the dates.

I would like to be able to reformat this data into average hourly week and weekend profile results. With the week profile excluding holidays.

DataFrame:

Date_Time           Equipment:Electricity:LGF   Equipment:Electricity:GF
01/01/2000 00:30    0.583979872                 0.490327348
01/01/2000 01:00    0.583979872                 0.490327348
01/01/2000 01:30    0.583979872                 0.490327348
01/01/2000 02:00    0.583979872                 0.490327348

I found an example (Getting the average of a certain hour on weekdays over several years in a pandas dataframe) that explains doing this for several years, but not explicitly for a week (without holidays) and weekend.

I realised that there is no resampling techniques in Pandas that do this directly, I used several aliases (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) for creating Monthly and Daily profiles.

I was thinking of using the business day frequency and create a new dateindex with working days and compare that to my DataFrame datetimeindex for every half hour. Then return values for working days and weekend days when true or false respectively to create a new dataset, but am not sure how to do this.

PS; I am just getting into Python and Pandas.

Community
  • 1
  • 1
Chris
  • 1,287
  • 12
  • 31

1 Answers1

1

Dummy data (for future reference, more likely to get an answer if you post some in a copy-paste-able form)

df = pd.DataFrame(data={'a':np.random.randn(1000)}, 
        index=pd.date_range(start='2000-01-01', periods=1000, freq='30T'))

Here's an approach. First define a US (or modify as appropriate) business day offset with holidays, and generate and range covering your dates.

from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())

bday_over_df = pd.date_range(start=df.index.min().date(), 
                             end=df.index.max().date(), freq=bday_us)

Then, develop your two grouping columns. An hour column is easy.

df['hour'] = df.index.hour

For weekday/weekend/holiday, define a function to group the data.

def group_day(date):
    if date.weekday() in [5,6]:
        return 'weekend'
    elif date.date() in bday_over_df:
        return 'weekday'
    else:
        return 'holiday'

df['day_group'] = df.index.map(group_day)

Then, just group by the two columns as you wish.

In [140]: df.groupby(['day_group', 'hour']).sum()
Out[140]: 
                       a
day_group hour          
holiday   0     1.890621
          1    -0.029606
          2     0.255001
          3     2.837000
          4    -1.787479
          5     0.644113
          6     0.407966
          7    -1.798526
          8    -0.620614
          9    -0.567195
          10   -0.822207
          11   -2.675911
          12    0.940091
          13   -1.601885
          14    1.575595
          15    1.500558
          16   -2.512962
          17   -1.677603
          18    0.072809
          19   -1.406939
          20    2.474293
          21   -1.142061
          22   -0.059231
          23   -0.040455
weekday   0     9.192131
          1     2.759302
          2     8.379552
          3    -1.189508
          4     3.796635
          5     3.471802
...                  ...
          18   -5.217554
          19    3.294072
          20   -7.461023
          21    8.793223
          22    4.096128
          23   -0.198943
weekend   0    -2.774550
          1     0.461285
          2     1.522363
          3     4.312562
          4     0.793290
          5     2.078327
          6    -4.523184
          7    -0.051341
          8     0.887956
          9     2.112092
          10   -2.727364
          11    2.006966
          12    7.401570
          13   -1.958666
          14    1.139436
          15   -1.418326
          16   -2.353082
          17   -1.381131
          18   -0.568536
          19   -5.198472
          20   -3.405137
          21   -0.596813
          22    1.747980
          23   -6.341053

[72 rows x 1 columns]
chrisb
  • 49,833
  • 8
  • 70
  • 70
  • Thanks a lot! This works great. Because I wanted half-hourly data I added df['half_hour'] = df.index.minute and combined this to a combined list through df['combined'] = df.apply(lambda x: '%02d:%02d' % (x['hour'], x['half_hour']), axis=1), and then used the combined list in the groupby. – Chris Jul 22 '14 at 12:08