Loop through multiple NetCDF files to calculate daily mean from hourly climate ERA5 datasets

Question

I have hundreds of NETCDF files that I obtained from ERA5(Land) datasets. The temporal resolutions of the data are in hours but I need to compile them into daily mean. Single calculation is straightforward in CDO (see below). But when I tried to loop through the files, I'm getting an error message that suggests that I can only calculate them one at a time, which would be quite laborious. I was wondering if there is a workaround either in R or CDO. Here are my CDO syntaxes:

$ cdo daymean infile1980.nc outfile_day1980.nc  ##single operation works fine.

Trying to loop through

for i in C:/path/.*nc 
do 
    cdo daymean "${pattern}"* "${pattern}_day.nc" 
done

cdo (Abort): Too many input streams for operator 'daymean'!

The goal is to aggregate each year's hourly data into daily data

It’s not clear what your end goal is. Is it to calculate daily mean from files and to then merge them? That can be done in one line with CDO — Robert Wilson, Mar 24 '23 at 07:29
I edited the tags again, you had the R tag and not the cdo tag, when it is a question related to cdo. — ClimateUnboxed, Mar 30 '23 at 07:23

ClimateUnboxed · Answer 1 · 2023-03-26T07:12:20.200

1

First of all, why do you have hundreds of files? You should combine retrievals into single requests or they will be banning your userID for clogging up the CDS queues with zillions of separate requests ;-) (Seriously though, please read their WIKI on combining requests).

Then to answer your question, yes there is a limit on number of input files (usually 256) which means you need use loops.

Anyway, to loop over years just do this

for year in `seq 1950 2022` ; do 
    cdo daymean infile${year}.nc outfile_day${year}.nc
done
cdo mergetime outfile_day????.nc all_data_daily.nc

Sometime you need to nest loops (e.g. first over month and then over years) - without your question specifying what the input format is, impossible to help you more precisely.

EDIT: I stand corrected on the ERA-land span, I was looking at outdated info, but nevertheless you shouldn't be downloading one file per step (and there is no need to split). See Robert's solution in the comment below, which is a much better way I didn't know about! I feel a new video coming on... ;-)

edited Mar 26 '23 at 07:12

answered Mar 24 '23 at 11:07

ClimateUnboxed

7,106
3
41
86

1

You could do that in one line with apply: `cdo -mergetime -apply,-daymean [ infile*.nc ] out.nc`. However, if the OP wants speed it would be faster to parallelize the CDO calls, which is typically much faster – Robert Wilson Mar 24 '23 at 12:26
ERA5 land dates back to 1950 ```https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form``` One can pull as many files as one wishes by modifying the CDS API's request in python. I found a workaround. Took a few steps but works. The first step is to merge all the .nc files ```$ cdo -b F64 -f nc2 mergetime *.nc all_data.nc ``` Then compute daily mean ```$ cdo daymean all_data.nc all_out.nc``` Then split by year ```$ cdo splityear all_out.nc var_out_``` – user11384727 Mar 24 '23 at 13:37
If you have answered your own question, please add it as answer for other users to know the question has been answered – Robert Wilson Mar 24 '23 at 13:41

score 0 · Accepted Answer · answered Mar 24 '23 at 13:45

I found a workaround. Took a few steps but works. The first step is to merge all the .nc files

## "-b F64"  helps with precision
## "-f nc2" forces the files into nc2 to overcome size. 
## See here: https://code.mpimet.mpg.de/boards/1/topics/908

$ cdo -b F64 -f nc2 mergetime *.nc all_data.nc

Next, compute the daily mean

$ cdo daymean all_data.nc all_out.nc

Then split the data by year

$ cdo splityear all_out.nc  var_out_  ## it automatically appends "year" to each output name

Loop through multiple NetCDF files to calculate daily mean from hourly climate ERA5 datasets

Trying to loop through

2 Answers2