How to calculate monthly mean of a time seies data and substract the monthly mean with the values of that month of each year?

Question

Dataset is a daily time series of 9 variables on a daily scale
I have extracted the dataset

Data = pd.read_csv('city10.csv', header = None)
Data['Date'] = pd.date_range(start='1/1/1951', periods=len(Data), freq='D')
Data.set_index('Date', inplace=True)

It looks like this

 Date         0    1       2       3  ...       5       6         7        8                                  
1951-01-01  28.361  0.0  131.24  405.39  ...  405.39  38.284  0.187010 -1.23550  
1951-01-02  27.874  0.0  113.74  409.56  ...  409.56  49.834  0.066903 -1.44770  
           ...  ...     ...     ...  ...     ...     ...       ...      ...  
2005-12-16  27.921  0.0  104.99  429.78  ...  429.78  47.529 -1.814300 -5.47720  
2005-12-17  27.918  0.0  112.11  425.32  ...  425.32  46.541 -3.314000 -4.02050

After this I found the month mean of the entire dataet i.e.

Data.groupby(Data.index.month).mean()

The result is

              0         1           2  ...          6         7         8                              
1     29.619322  0.215978  108.621532  ...  45.868395 -0.234236 -1.865947
2     32.404500  0.290335   95.270385  ...  43.443624  0.554149 -2.360776
3     35.131266  0.364438   78.907920  ...  42.065113  1.458203 -2.636451
4     36.631282  0.998401   53.663939  ...  44.239469  3.146849 -2.193416
5     36.823308  2.113330   37.917831  ...  54.287356  5.241153 -0.694375
6     34.444513  2.195926   35.315554  ...  67.840239  6.393643  0.689087
7     32.951826  3.567160   32.466668  ...  82.347247  6.583195  1.183262
8     32.644236  4.053641   36.379228  ...  85.056697  5.102383  0.005426
9     32.205442  4.885259   50.595568  ...  80.335829  2.413891 -0.578568
10    30.448266  5.748111   79.575731  ...  67.582589 -0.769297 -0.614057
11    28.748315  4.350384  100.293532  ...  53.418955 -1.258580 -1.023143
12    28.155611  1.524177  109.510292  ...  51.317731 -0.936495 -1.549105

Now,how to subtract the mean of each month with the respective values of that month of each year.
For e.g. January month mean for time-series 1951-2005 is 20.25
This mean has to be subtracted from daily values of all January month. How to do this?

Note that you are more likely to get quick answers if you make up a short example and include the code that generates this example, see the beginning of my answer for an example. — yardsale8, Feb 07 '20 at 15:25
values = pd.read_csv('city10.csv', header = None) values['Date'] = pd.date_range(start='1/1/1951', periods=len(values), freq='D') values.set_index('Date', inplace=True) values.groupby(values.index.month).mean() — imran khan, Feb 07 '20 at 15:34

yardsale8 · Answer 1 · 2020-02-07T15:52:21.717

Original answer -- difference between data and this month's average

I would use pandas to complete this task, as it makes it easy to aggregate by date.

First, let's make an example data frame and add a month .

In [45]: import pandas as pd

In [46]: import numpy as np

In [47]: start = datetime.datetime(2011, 1, 1)

In [48]: end = datetime.datetime(2012, 1, 1)

In [49]: df = pd.DataFrame({'date':pd.date_range(start, periods=1000, freq='D'), 'x':np.random.normal(5,1,1000)})
In [84]: df = pd.DataFrame({'date':pd.date_range(start, periods=1000, freq='D'), 'x':np.random.normal(5,1,1000)})

In [86]: df['month'] = df.date.dt.month

In [87]: df.head()
Out[87]:
        date         x  month
0 2011-01-01  5.139113      1
1 2011-01-02  3.774586      1
2 2011-01-03  6.095986      1
3 2011-01-04  5.037072      1
4 2011-01-05  5.871760      1
2011-01-05 2011-01-05  6.308203

Now we can create a new data frame that contains the monthly averages using resample and mean.

In [58]: monthly_mean = df.resample('M').mean()

In [59]: monthly_mean.head()
Out[59]:
                   x
date
2011-01-31  4.702853
2011-02-28  5.088545
2011-03-31  5.261777
2011-04-30  4.982984
2011-05-31  4.791729

We can compute the o

Next, we need to join the two data frames together to line up the data with the monthly averages. To make this easier, I will create a year and month column in each data frame that will be used in the join/merge.

In [60]: df['month'] = df.index.month

In [61]: monthly_mean['month'] = monthly_mean.index.month

In [62]: df['year'] = df.index.year

In [63]: monthly_mean['year'] = monthly_mean.index.year

In [64]: df_joined = pd.merge(df, monthly_mean, how='left', on=('year', 'month'))

In [65]: df_joined.head()
Out[65]:
        date       x_x  month  year       x_y
0 2011-01-01  5.388197      1  2011  4.702853
1 2011-01-02  6.442878      1  2011  4.702853
2 2011-01-03  5.979076      1  2011  4.702853
3 2011-01-04  2.846689      1  2011  4.702853
4 2011-01-05  5.103524      1  2011  4.702853

Finally, the new column can be constructed by subtracting columns.

In [66]: df_joined['month_diff'] = df_joined.x_x - df_joined.x_y

In [67]: df_joined.head()
Out[67]:
        date       x_x  month  year       x_y  month_diff
0 2011-01-01  5.388197      1  2011  4.702853    0.685344
1 2011-01-02  6.442878      1  2011  4.702853    1.740025
2 2011-01-03  5.979076      1  2011  4.702853    1.276223
3 2011-01-04  2.846689      1  2011  4.702853   -1.856164
4 2011-01-05  5.103524      1  2011  4.702853    0.400670

EDIT: If you want the difference with the historic monthly averages, make the following changes.

Add the month, group by, and aggregate to get the monthly averages.

In [88]: monthly_mean = df.groupby('month').agg('mean')

Now the process proceed as before, join, this time just by 'month', and compute the difference.

In [90]: df_joined = pd.merge(df, monthly_mean, how='left', on='month')

In [91]: df_joined.head()
Out[91]:
        date       x_x  month       x_y
0 2011-01-01  5.139113      1  4.972604
1 2011-01-02  3.774586      1  4.972604
2 2011-01-03  6.095986      1  4.972604
3 2011-01-04  5.037072      1  4.972604
4 2011-01-05  5.871760      1  4.972604

In [92]: df_joined['month_diff'] = df_joined.x_x - df_joined.x_y

In [93]: df_joined.head()
Out[93]:
        date       x_x  month       x_y  month_diff
0 2011-01-01  5.139113      1  4.972604    0.166509
1 2011-01-02  3.774586      1  4.972604   -1.198018
2 2011-01-03  6.095986      1  4.972604    1.123382
3 2011-01-04  5.037072      1  4.972604    0.064468
4 2011-01-05  5.871760      1  4.972604    0.899156

I have several years i.e. from 1951-2005. I will get only 12 mean (Monthly mean of the entire dataset) in my case. Now how to subtract those means to the respective days of that month of each year. — imran khan, Feb 07 '20 at 15:37
You were almost there with your updated question, just needed to join the old and new tables by month. — yardsale8, Feb 07 '20 at 15:54

score 0 · Answer 2 · answered Feb 07 '20 at 15:58

0

Thank you, everyone. I am able to solve the problem.
I hope it is right.

Anomaly_Values = Data.sub(Data.groupby([Data.index.month]).transform('mean'))

Let me know if there is any problem in the solution.

answered Feb 07 '20 at 15:58

imran khan

105
5

How to calculate monthly mean of a time seies data and substract the monthly mean with the values of that month of each year?

2 Answers2

Linked