Obtain mean value of specific area in netCDF

Question

I am trying to plot a time series of the sea surface temperature (SST) for a specific region from a .nc file. The SST is a three-dimensional variable (lat,lon,time), that has mean daily values for a specific region from 1982 to 2016. I want my plot to reflect the seasonal sst variability of the entire period of time. I assume that what I need to do first is to obtain a mean sst value for my lat,lon region for each of the days with which I can work alter on. So far, I assume that I need to read the .nc file and the variables:

import netCDF4 as nc

f = nc.Dataset('cmems_SST_MED_SST_L4_REP_OBSERVATIONS_010_021_1639073212518.nc')
sst = f.variables['analysed_sst'][:]
lon = f.variables['longitude'][:]
lat = f.variables['latitude'][:]

Next, following the code suggested here, I tried to reshape and obtain the mean, but an error pops up:

global_average= np.nanmean(sst[:,:,:],axis=(1,2))
annual_temp = np.nanmean(np.reshape(global_average, (34,12)), axis = 1) 

#34 years between 1982 and 2016, and 12 months per year.

ERROR cannot reshape array of size 14008 into shape (34,12)

From here I found different ways, like using cdo or nco (which didn't work due installation problems) among others, which were not suitable for my case. I used nanmean because know that in MATLAB this is done using the nanmean function. I am quite new to this topic and I would like to ask for some hints, like, where should I focus more or what path is more suitable for this case. Thank you!!

In the link that you sent, they use monthly data of 117 years. You use daily data. So you create a global average of temperature for each day. For some reason, you have 14,008 data points which is > 34*365 = 12,410. Without the dataset this is difficult to solve, but what you tried with the reshape can not work, as you don't have 34*12= 408 datapoints in your global_average dataset — steTATO, Dec 14 '21 at 12:27
You should find out more about the shape of the original dataset first and which datapoints are from which year. — steTATO, Dec 14 '21 at 12:37
Okat, thank you @steTATO , I will provide more details about the dataset shortly. — OlegRuskiy, Dec 14 '21 at 13:35
have you considered [xarray](https://xarray.pydata.org/en/stable/index.html)? This would be `ds = xr.open_dataset(...)`, then `ds.groupby('time.season').mean(dim='time')` — Michael Delgado, Dec 14 '21 at 21:59

Redshoe · Answer 1 · 2021-12-30T04:33:06.570

Handling daily data with just pure python is difficult because you should consider leap years and sub-setting a region require tedious indexing striding....

As steTATO mentioned, since the data that you are working has daily temporal resolution you need to consider the following

You need to reshape the global_average in the shape of (34,365) or (34,366) depending on the year (1984,1988,1992,1996,2000,2004,2008,2012,2016). So your above code should look something like

annual_temp = np.nanmean(np.reshape(global_average, (34,365)), axis = 1)

But, like I said, because of the leap years, you can't do the things you want by simply reshaping the global_average.

If I had no choice but to use python only, I'd do the following

import numpy as np

def days_in_year(in_year):
   leap_years = [1984,1988,1992,1996,2000,2004,2008,2012,2016]
   if (in_year in leap_years):
      out_days = 366
   else:
      out_days = 365

   return out_days


# some of your code, importing netcdf data

year = np.arange(1982,2017)

global_avg= np.nanmean(sst[:,:,:],axis=(1,2))

annual_avgs = []
i = 0
for yr in range(35):
   i  =  i + days_in_year(year[yr])
   f  =  i - days_in_year(year[yr])

   annual_avg  =  np.nanmean(global_avg[i:f])
   annual_avgs.append(annual_avg)

Above code basically takes and averages by taking strides of the global_avg considering the leap year, and saving it as annual_avgs.

Obtain mean value of specific area in netCDF

1 Answers1