Averaging multiple netCDF4 files with python

Question

I am a bit of a netCDF in python noob so please excuse this noob question.

I have a folder filled with circa 3650 netCDF4 files. One file per day for a decade. the niles are named yyyymmdd.nc (e.g. 20100101,20100102,20100103,etc.). Each .nc file contains latitude, longitude, and temperature at one-time point for the same area - a section of the Tonga EEZ.

What I am trying to do is compute the average temperature for each lat and lon from across all files, i.e. I want to end up with one .nc file that has all the same lats and lons and average temperature across 10 years.

I have tried different things/versions of code, usually, they end up looking something like this.....

files = glob('*.nc')
ds = xr.open_mfdataset(files,)
mean = np.mean(ds['temp'][:, 0].values)

...... This code would give me the average temperature within a .nc file for all .nc files and not the average temperature based on lat and lon across a decade worth of files.

All and any help is much appreciated.

Thank you.

I have feeling that `mean = np.mean(ds['temp'][:, 0].values)` is not good approach as you already then cut some data. What are the dimensions of ds['temp']? Does the `mean = np.mean(ds['temp'][:].values,axis=0)` work? — msi_gerva, Jun 07 '21 at 09:13

score 2 · Accepted Answer · answered Jun 07 '21 at 14:11

2

Assuming you are working on linux/macOS, this can be done easily using my nctoolkit package(see details here).

The following will calculate the mean across all files and then plot the results:

import nctoolkit as nc
files = glob('*.nc')
ds = nc.open_data(file)
ds.ensemble_mean()
ds.plot()

nctoolkit uses CDO as a back-end by default, but can use NCO as well, which can result in a performance improvement. So the following might be faster:

import nctoolkit as nc
files = glob('*.nc')
ds = nc.open_data(file)
ds.ensemble_mean(nco=True)
ds.plot()

answered Jun 07 '21 at 14:11

Robert Wilson

3,192
11
19

this works like a charm. thumbs up for nctoolkit. – kawakawa Jun 15 '21 at 09:29
Hi Robert. again thanks for introducing me to this. nctookit is really making things easier for me. I have a quick follow-up question. using ds.ensemble can I also calculate the standard deviation. I have tried ds.ensemble_stdev(nco=True) but this errors out with stdev not found. – kawakawa Jun 29 '21 at 23:06
Thanks. At the minute no. But I can implement this relatively easily. If you need this functional urgently you could raise an issue here https://github.com/pmlmodelling/nctoolkit/issues and I can quickly add it to the dev version – Robert Wilson Jul 01 '21 at 05:10

ClimateUnboxed · Answer 2 · 2021-06-09T12:16:54.927

You can use the cdo package to do this using a wild card in the input file name. I've only tested it with a small number of files though, there is a caveat in that you might hit a system limit on the number of open files.

from cdo import *
cdo=Cdo()
cdo.ensmean(input='*.nc',output='ensmean.nc')

This is basically the equivalent of the command line call to cdo

cdo ensmean *.nc ensmean.nc

That said, it sounds to me like it would be better to cat them together and then use timmean:

cdo.timmean(input=cdo.mergetime(input='*.nc'),output='timmean.nc')

which again is the python equivalent to

cdo mergetime *.nc all.nc
cdo timmean all.nc timmean.nc

try both and see which one works/is fastest :-)

Averaging multiple netCDF4 files with python

2 Answers2