2

I have the following dataframe

data = pd.DataFrame({
'date': ['1988/01/12', '1988/01/13', '1988/01/14', '1989/01/20','1990/01/01'],
'value': [11558522, 12323552, 13770958, 18412280, 13770958]
}) 

Is there a way in python that I can average a value for a whole month and make that the new value for that month i.e. I want to average the 1988-01 value and make that the final value for 1988-01. I tried the groupby method but that didnt work

new_df=data.groupby(['date']).mean()
Tamarie
  • 125
  • 2
  • 6
  • 18

2 Answers2

1

Use month periods created by Series.dt.to_period:

data['date'] = pd.to_datetime(data['date'])

new_df=data.groupby(data['date'].dt.to_period('m')).mean()
print (new_df)
                value
date                 
1988-01  1.255101e+07
1989-01  1.841228e+07
1990-01  1.377096e+07

Or use DataFrame.resample and if necessary remove missing values:

new_df=data.resample('MS', on='date').mean().dropna()
print (new_df)
                   value
date                    
1988-01-01  1.255101e+07
1989-01-01  1.841228e+07
1990-01-01  1.377096e+07

Or you can use months and years separately for MultiIndex:

new_df=data.groupby([data['date'].dt.year.rename('y'),
                     data['date'].dt.month.rename('m')]).mean()
print (new_df)

               value
y    m              
1988 1  1.255101e+07
1989 1  1.841228e+07
1990 1  1.377096e+07
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • for some reason this works perfectly with the sample data but with my actual dataset it prints the values as 0. It prints out the dates correctly but the values are all 0 for the entire dataset – Tamarie Feb 25 '20 at 09:05
  • @Tamarie - First idea is check if data are numeric, hard question if yes. – jezrael Feb 25 '20 at 09:06
  • yes its numeric. i ran data.dtypes i got datetime64 and float 64 – Tamarie Feb 25 '20 at 09:11
  • @Tamarie - all 3 solutions failed? – jezrael Feb 25 '20 at 09:11
  • all 3 solutions work they just dont print out the values (it's all blank). But i can see that it has 1 value for each month – Tamarie Feb 25 '20 at 09:14
  • @Tamarie - hmmm, weird. If check datetimes format is correct? Not swapped months with datetimes? – jezrael Feb 25 '20 at 09:16
1

df=pd.read_csv("data .csv",encoding='ISO-8859-1', parse_dates=["datetime"]) print(df) print(df.dtypes)

             datetime   Temperature

0 1987-11-01 07:00:00 21.4 1 1987-11-01 13:00:00 27.4 2 1987-11-01 19:00:00 25.0 3 1987-11-02 07:00:00 22.0 4 1987-11-02 13:00:00 27.6 ... ... 27554 2020-03-30 13:00:00 24.8 27555 2020-03-30 18:00:00 23.8 27556 2020-03-31 07:00:00 23.4 27557 2020-03-31 13:00:00 24.6 27558 2020-03-31 18:00:00 26.4

df1=df.groupby(pd.Grouper(key='datetime',freq='D')).mean()

datetime Temperature
1987-11-01 24.600000 1987-11-02 25.066667 1987-11-03 24.466667 1987-11-04 22.533333 1987-11-05 25.066667 ... 2020-03-27 26.533333 2020-03-28 27.666667 2020-03-29 27.733333 2020-03-30 24.266667 2020-03-31 24.800000

  • 1
    You're right; the groupBy was failing on strings and parsing to datetime solves that. Explaining that a bit might make this less cryptic. – Jim Meyer Dec 26 '20 at 20:03
  • 1
    You're right Jim Meyer, the first thing is to pass the date and time that comes by default as an object. If it is not done, the error. And the pandas.Grouper function allows to specify a groupby statement (datetime & frec D) data = pd.read_csv ("TTPM_POTOSI2.csv", encoding = 'ISO-8859-1', parse_dates = ["Datetime"]) datetime temperature 0 1987-11-01 07:00:00 21.4 1 1987-11-01 13:00:00 27.4 2 1987-11-01 19:00:00 25.0 df1 = data.groupby (pd.Grouper (key = 'datetime', freq = 'D')). mean () datetime temperature 1987-11-01 24.600000 1987-11-02 25.066667 – faquimbayal Dec 27 '20 at 02:00