1

I have a data set and needs to calculate daily average and weekly average. I know this can be done by Pandas. Below is the data and code that I have till now;

date                T1      T2      T3
12/17/13 00:28:38   19      23.1    7.3
12/17/13 00:58:38   19      22.9    7.3
12/17/13 01:28:38   18.9    22.8    6.3
12/17/13 01:58:38   18.9    23.1    6.3
12/17/13 02:28:38   18.8    23      6.3
12/17/13 02:58:38   18.8    22.9    6.3
.......
12/18/13 00:28:07   19.5    22.4    5.3
12/18/13 00:58:08   19.4    22.3    5.3
12/18/13 01:28:07   19.4    22.1    5.3
.......
3/22/14 16:55:18    17.7    20.6    10.1
3/22/14 17:08:31    17.7    20.6    10.1
3/22/14 17:26:04    17.6    20.5    8
3/22/14 17:56:04    17.7    20.5    7

and the code that I have till now is;

import pandas as pd
Temp=pd.read_csv("Book1.csv",parse_dates=['date'])  
Temp=Temp.set_index('date')
In [25]: Temp_plot.head()
Temp_plot=Temp.resample('W',how='mean')
Temp_plot.head()
Out[25]:
T1  T2  T3
date            
2013-12-22  18.740345   35.055517   7.532414
2013-12-29  14.501770   14.950442   6.497935
2014-01-05  13.135207   14.064793   7.795858
2014-01-12  17.296154   38.503550   7.827219
2014-01-19  18.217699   38.892625   6.952212

The problem is now I have to delete some mean weekly values that were holidays and should not be included in the resulted mean values. Shall I have one list that contains the dates that should not be included and then comparing values in the Temp_plot?

EDIT

I have added a list Wase = ["2013-12-22","2014-01-05"], as suggested in comments and used Temp_plot1 = Temp_plot.drop(Wase) Now I got any error, which says ValueError: labels ['2013-12-22' '2014-01-05'] not contained in axis. Any idea how to remove this error as I have to delete row that contains dates contain in list.

Muhammad
  • 305
  • 2
  • 6
  • 20
  • If you have a container (e.g. `list`) of dates that you want to exclude called, say, `unwanted_dates` you can just do `Temp_plot.drop(unwanted_dates)`. Note this returns a view with the desired dates excluded and doesn't actually alter `Temp_plot`. To drop them permanently do `Temp_plot = Temp_plot.drop(unwanted_dates)` or `Temp_plot.drop(unwanted_dates, inplace=True)` – JoeCondron Sep 27 '15 at 13:34
  • @JoeCondron I have tried to do that but gives me an error that `ValueError: labels ['2013-12-22' '2014-01-05'] not contained in axis`. Any ideas why this is happening? – Muhammad Sep 27 '15 at 15:02
  • If your data did not have any observations during the holidays, then they are not included in the mean anyway. If you do have holidays in your data, then you should remove them before resampling. – Alexander Sep 27 '15 at 15:21
  • Also, you can set your index when you read the csv file: `pd.read_csv("Book1.csv", parse_dates=['date'], index_col='date')` – Alexander Sep 27 '15 at 15:24
  • @Alexander The data does contain holidays and holidays should be removed from the data. If I have to remove it before resampling then there will be so many rows and it will be difficult. Can't I drop them as suggested by @JoeCondron? – Muhammad Sep 27 '15 at 15:27
  • Which holiday calendar? U.S. Federal Holidays or something else? – Alexander Sep 27 '15 at 15:30
  • Well, its holidays in a School in UK. But it is not just holidays' week that I will be deleting, I will be deleting some other weeks as well. So the best idea would be to have a list that contain all those dates and then drop them after resampling. Which I tried to do but got error. – Muhammad Sep 27 '15 at 15:37

1 Answers1

0

You need to create a calendar of holidays using dt.date(year, month, day). Then you filter the holidays from the index using a list comprehension structure as shown below. Lastly, you select these filtered dates using .ix which selects data from a dataframe based on the index value.

import datetime as dt

holidays = [dt.date(2015, 12, 25), ...] 
idx = [timestamp for timestamp in Temp.index if timestamp.date() not in holidays]
Temp_plot = Temp.ix[idx].resample('W', how='mean')
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Thanks for your time but it is not working. I have added `holidays = [dt.date(2013, 12, 17), (2013, 12, 18), (2013, 12, 19), (2013, 12, 20), (2013, 12, 21), (2013, 12, 22)]` `Temp=Temp.set_index('date')` and same lines like yours `idx=...` and `Temp_plot=...` but it gave same results as before meaning the holidays are not removed. I have also tried to to sampling on daily basis before doing it on weekly basis but this ain't give me any different results. Your help will be appreciated. – Muhammad Sep 27 '15 at 16:09
  • because...? What is the error. My guess is the format of your date. What is `type(Temp.index.iat[0])`? – Alexander Sep 27 '15 at 16:11
  • The format of date in the file is 12/17/13 00:28:38 M/D/Y and HH:MM:YY. There is no error. – Muhammad Sep 27 '15 at 16:17
  • Do I have to change it to 2013-12-22? – Muhammad Sep 27 '15 at 16:18
  • You need dt.date(...) for each date in your holidays. Or you can do this: `holidays = [dt.date(d[0], d[1], d[2]) for d in [(2013, 12, 18), (2013, 12, 19), (2013, 12, 20), (2013, 12, 21), (2013, 12, 22)]]`. Also, I tried the code using your sample data and it works for me. – Alexander Sep 27 '15 at 16:29