0

I have a dataset (a netCDF4 input_file) with the dimensions (504, 720, 500) where the first is a datetime value:

0     1979-01-15
1     1979-02-15
2     1979-03-15
3     1979-04-15
4     1979-05-15
         ...    
499   2020-08-15
500   2020-09-15
501   2020-10-15
502   2020-11-15
503   2020-12-15
Length: 504, dtype: datetime64[ns]

There is a variable with values I want to average per month. So ultimately I would like 12 values with the average of the variable based on the month in the first dimension.

I tried looping over it like such:

# empty dataframe
df = pd.DataFrame(columns = ['Month', 'Value'])

for i in range(size(df['time'])):
    month = input_file['time'][i].month # get the current month
    avg = np.average(input_file['values'][i, :, :]) # average for the month of that year

    # append to df
    df = df.append(pd.DataFrame({'Month' : month,
                                 'Value' : avg})   

But up until here I am a bit lost, this doesn't work (invalid syntax) and I would still need to loop over the values again to get the average for each month seperately.

B.Quaink
  • 456
  • 1
  • 4
  • 18

2 Answers2

1

Assuming the 2nd and 3rd dimensions are lat and lon, it seems what you are trying to do is just:

input_file.mean(dim = ['lat', 'lon'])

Then you can convert to a dataframe with .to_dataframe()

Thrasy
  • 536
  • 3
  • 9
0

I'm not sure if this is what you need

xr.open_dataset('file.nc')
xr.resample(time ='M').mean()
aaossa
  • 3,763
  • 2
  • 21
  • 34