1

Dataset is a daily time series of 9 variables on a daily scale
I have extracted the dataset

Data = pd.read_csv('city10.csv', header = None)
Data['Date'] = pd.date_range(start='1/1/1951', periods=len(Data), freq='D')
Data.set_index('Date', inplace=True)

It looks like this

 Date         0    1       2       3  ...       5       6         7        8                                  
1951-01-01  28.361  0.0  131.24  405.39  ...  405.39  38.284  0.187010 -1.23550  
1951-01-02  27.874  0.0  113.74  409.56  ...  409.56  49.834  0.066903 -1.44770  
           ...  ...     ...     ...  ...     ...     ...       ...      ...  
2005-12-16  27.921  0.0  104.99  429.78  ...  429.78  47.529 -1.814300 -5.47720  
2005-12-17  27.918  0.0  112.11  425.32  ...  425.32  46.541 -3.314000 -4.02050 

After this I found the month mean of the entire dataet i.e.

Data.groupby(Data.index.month).mean()

The result is

              0         1           2  ...          6         7         8                              
1     29.619322  0.215978  108.621532  ...  45.868395 -0.234236 -1.865947
2     32.404500  0.290335   95.270385  ...  43.443624  0.554149 -2.360776
3     35.131266  0.364438   78.907920  ...  42.065113  1.458203 -2.636451
4     36.631282  0.998401   53.663939  ...  44.239469  3.146849 -2.193416
5     36.823308  2.113330   37.917831  ...  54.287356  5.241153 -0.694375
6     34.444513  2.195926   35.315554  ...  67.840239  6.393643  0.689087
7     32.951826  3.567160   32.466668  ...  82.347247  6.583195  1.183262
8     32.644236  4.053641   36.379228  ...  85.056697  5.102383  0.005426
9     32.205442  4.885259   50.595568  ...  80.335829  2.413891 -0.578568
10    30.448266  5.748111   79.575731  ...  67.582589 -0.769297 -0.614057
11    28.748315  4.350384  100.293532  ...  53.418955 -1.258580 -1.023143
12    28.155611  1.524177  109.510292  ...  51.317731 -0.936495 -1.549105

Now,how to subtract the mean of each month with the respective values of that month of each year.
For e.g. January month mean for time-series 1951-2005 is 20.25
This mean has to be subtracted from daily values of all January month. How to do this?

imran khan
  • 105
  • 5
  • Could you please add some code that you have tried? – Rahul P Feb 07 '20 at 15:21
  • Note that you are more likely to get quick answers if you make up a short example and include the code that generates this example, see the beginning of my answer for an example. – yardsale8 Feb 07 '20 at 15:25
  • values = pd.read_csv('city10.csv', header = None) values['Date'] = pd.date_range(start='1/1/1951', periods=len(values), freq='D') values.set_index('Date', inplace=True) values.groupby(values.index.month).mean() – imran khan Feb 07 '20 at 15:34

2 Answers2

1

Original answer -- difference between data and this month's average

I would use pandas to complete this task, as it makes it easy to aggregate by date.

First, let's make an example data frame and add a month .

In [45]: import pandas as pd

In [46]: import numpy as np

In [47]: start = datetime.datetime(2011, 1, 1)

In [48]: end = datetime.datetime(2012, 1, 1)

In [49]: df = pd.DataFrame({'date':pd.date_range(start, periods=1000, freq='D'), 'x':np.random.normal(5,1,1000)})
In [84]: df = pd.DataFrame({'date':pd.date_range(start, periods=1000, freq='D'), 'x':np.random.normal(5,1,1000)})

In [86]: df['month'] = df.date.dt.month

In [87]: df.head()
Out[87]:
        date         x  month
0 2011-01-01  5.139113      1
1 2011-01-02  3.774586      1
2 2011-01-03  6.095986      1
3 2011-01-04  5.037072      1
4 2011-01-05  5.871760      1
2011-01-05 2011-01-05  6.308203

Now we can create a new data frame that contains the monthly averages using resample and mean.

In [58]: monthly_mean = df.resample('M').mean()

In [59]: monthly_mean.head()
Out[59]:
                   x
date
2011-01-31  4.702853
2011-02-28  5.088545
2011-03-31  5.261777
2011-04-30  4.982984
2011-05-31  4.791729

We can compute the o

Next, we need to join the two data frames together to line up the data with the monthly averages. To make this easier, I will create a year and month column in each data frame that will be used in the join/merge.

In [60]: df['month'] = df.index.month

In [61]: monthly_mean['month'] = monthly_mean.index.month

In [62]: df['year'] = df.index.year

In [63]: monthly_mean['year'] = monthly_mean.index.year

In [64]: df_joined = pd.merge(df, monthly_mean, how='left', on=('year', 'month'))

In [65]: df_joined.head()
Out[65]:
        date       x_x  month  year       x_y
0 2011-01-01  5.388197      1  2011  4.702853
1 2011-01-02  6.442878      1  2011  4.702853
2 2011-01-03  5.979076      1  2011  4.702853
3 2011-01-04  2.846689      1  2011  4.702853
4 2011-01-05  5.103524      1  2011  4.702853

Finally, the new column can be constructed by subtracting columns.

In [66]: df_joined['month_diff'] = df_joined.x_x - df_joined.x_y

In [67]: df_joined.head()
Out[67]:
        date       x_x  month  year       x_y  month_diff
0 2011-01-01  5.388197      1  2011  4.702853    0.685344
1 2011-01-02  6.442878      1  2011  4.702853    1.740025
2 2011-01-03  5.979076      1  2011  4.702853    1.276223
3 2011-01-04  2.846689      1  2011  4.702853   -1.856164
4 2011-01-05  5.103524      1  2011  4.702853    0.400670

EDIT: If you want the difference with the historic monthly averages, make the following changes.

Add the month, group by, and aggregate to get the monthly averages.

In [88]: monthly_mean = df.groupby('month').agg('mean')

Now the process proceed as before, join, this time just by 'month', and compute the difference.

In [90]: df_joined = pd.merge(df, monthly_mean, how='left', on='month')

In [91]: df_joined.head()
Out[91]:
        date       x_x  month       x_y
0 2011-01-01  5.139113      1  4.972604
1 2011-01-02  3.774586      1  4.972604
2 2011-01-03  6.095986      1  4.972604
3 2011-01-04  5.037072      1  4.972604
4 2011-01-05  5.871760      1  4.972604

In [92]: df_joined['month_diff'] = df_joined.x_x - df_joined.x_y

In [93]: df_joined.head()
Out[93]:
        date       x_x  month       x_y  month_diff
0 2011-01-01  5.139113      1  4.972604    0.166509
1 2011-01-02  3.774586      1  4.972604   -1.198018
2 2011-01-03  6.095986      1  4.972604    1.123382
3 2011-01-04  5.037072      1  4.972604    0.064468
4 2011-01-05  5.871760      1  4.972604    0.899156
yardsale8
  • 940
  • 9
  • 15
  • I have several years i.e. from 1951-2005. I will get only 12 mean (Monthly mean of the entire dataset) in my case. Now how to subtract those means to the respective days of that month of each year. – imran khan Feb 07 '20 at 15:37
  • You were almost there with your updated question, just needed to join the old and new tables by month. – yardsale8 Feb 07 '20 at 15:54
0

Thank you, everyone. I am able to solve the problem.
I hope it is right.

Anomaly_Values = Data.sub(Data.groupby([Data.index.month]).transform('mean'))

Let me know if there is any problem in the solution.

imran khan
  • 105
  • 5