NetCDF file with all data variable values missing when I read it into Python with both Xarray and netCDF4

Question

I have a netCDF file generated by a model output. The file contains spatially gridded variables over a 30 yr time span and I've confirmed that the file contains the data using Ferret from within linux. When I read the file into Python with both xarray and netCDF4 the file reads successfully with correct dimensions but the data variables are all missing.

I first obtained the error:

ValueError: unable to decode time units 'growing seasons since 2071-01-01 00:00:00' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.

So, I added the following lines in order to solve the time issue:

ds = xr.open_dataset('my_file.nc4', decode_times=False)
units, reference_date = ds.time.attrs['units'].split('since')`
ds['time'] = pd.date_range(start=reference_date, periods=ds.sizes['time'],` `freq='A')`

Now, there is no error when reading the file in but all of the data variables are showing NaN. I don't have a problem reading in any other netCDF files. I have a very similar file with extension '.nc' instead of '.nc4' which is one of the tiles that make up the final file and it's reading in with all data present. I'm thinking there is some sort of disagreement with the dimensions of my dataset and xarray. Here is the summary of the dataset:

Dataset summary

Welcome to stack overflow! This seems like a frustrating problem. Assuming your data is a netcdf4 format, xarray uses the [netcdf4-python](https://unidata.github.io/netcdf4-python/) library to read the data. So if you're opening the data with that library and you also see nans, the problem is not occurring in xarray. You may have a low-level encoding problem in your data, but we can't help diagnose without the dataset and the full set of code you're using to read it in. Please try to create a [minimal reproducible example](/help/minimal-reproducible-example). — Michael Delgado, Dec 04 '21 at 18:20
Also, how are you checking that the variables are all missing? What is the result of `print(ds.notnull().any())` immediately after reading in the data with xarray? — Michael Delgado, Dec 04 '21 at 18:23
I'm not sure how to recreate a minimal reproducible example of this netCDF file because it's a bit complex. The result of `print(ds.notnull().any())` is ` Dimensions: () Data variables: (12/18) ADAT bool True CO2A bool True CWAM bool True DRCM bool True ETCM bool True ETCP bool True ... ... PRCP bool True ROCM bool True SWXM bool True TMAXA bool True TMINA bool True scen bool True` — spurdom, Dec 06 '21 at 15:47
It looks like all of your variables contain non-null values. Are you looking at the "..." And assuming the data is null? Netcdf4 and xarray load data lazily, meaning only the parts of the array which are needed will be read, so until you load the data you won't get a preview. If you call `ds = ds.load()` does that solve the problem? Give the [xarray reading and writing files docs](http://xarray.pydata.org/en/stable/user-guide/io.html) a close read. — Michael Delgado, Dec 06 '21 at 18:24
I think I figured it out - by using a simple `plt.contourf(ds['myvar'][0,0,0,:,:])` I was able to confirm there is data in the netCDF file. I think the head and tail of that variable are missing values hence the NaNs showing up when I print the summary of the dataset. Thanks for your help — spurdom, Dec 06 '21 at 22:43

NetCDF file with all data variable values missing when I read it into Python with both Xarray and netCDF4

0 Answers0