How to find average value for netCDF over entire time period recorded in dataset (Python)?

Question

I have downloaded a published netCDF file that contains various climate data globally for 120,000 years into the past. (See public .nc file contained here: https://figshare.com/articles/dataset/LateQuaternary_Environment_nc/12293345/3). There are many variables including temperature, precipitation, etc.

All I want is to find out, globally, what was the average precipitation for the last 120,000 years. I.e., I want to end up with a single map for time-averaged precipitation for the whole planet. An important point is that the time intervals are not equally spaced, they vary between 1 and 2 thousand years. While trying to figure this out, I am running into many problems using Xarray, as I have not worked with netCDFs before. I tried using this simple method:

import xarray as xr
climate_file = 'LateQuaternary_Environment.nc' #Dataset linked to above
ds = xr.open_dataset(climate_file, decode_times=False)
ppt = ds.precipitation
ppt_avg = ppt.mean('time')

However, when I print(ppt_avg), all the values in the array are NaNs. Also, even if this gave actual values, I'm not sure if they would be the correct mean, because the time intervals are all different, so would I not have to weight them somehow, or resample the time data? I don't know because I don't understand how time averaging works for netCDFs or Xarray functions. I am not even sure if Xarray is the right thing to use.

Any help is welcome, thanks!

I 'solved' this by using another (related) dataset, the 800K year dataset instead of the 120k. The years were equally spaced in this new dataset. — g.is.stuck, Mar 16 '22 at 16:55

score 0 · Answer 1 · answered Jan 13 '22 at 10:18

0

I didn't test it but I think a solution could be to upsample your data on an evenly distributed time grid and interpolate before computing the average:

da = da.resample(time = "1000Y").interpolate("linear")
da.mean(dim = "time")

answered Jan 13 '22 at 10:18

Thrasy

536
3
9

Thank you Thrasy! This is something I tried yesterday too, but I immediately got the error: "TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'". So I gave up -- it seems the time unit isn't quite right? – g.is.stuck Jan 13 '22 at 13:55
@g.is.stuck I think it may be because you set decode_times = False in open_dataset. – Thrasy Jan 13 '22 at 15:56
Thank you! That is probably it, but the reason I put that in is was to solve another error... If I delete the "decode_times = False", then I get: "ValueError: unable to decode time units 'years since present' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed." – g.is.stuck Jan 13 '22 at 16:57
I never had this issue but I guess it make sense that that -120.000 does not fit nicely in the default calendar. `mean(dime = 'time')` should work though. For the upsampling, if you can figure out the time grid you want, you should be able to do it with `da.reindex(time=new_grid, method='None').interpolate_na(dim='time", method='linear')`. – Thrasy Jan 13 '22 at 20:20

How to find average value for netCDF over entire time period recorded in dataset (Python)?

1 Answers1