netCDF4: Approaches to dealing with multidimensional data with unused grid points

Question

I am using netCDF4 to store multidimensional data. The data has, for example, three dimensions, time = [0, 1, 2], height = [10, 20], direction = [0, 120, 180, 240, 300], but not for all combinations (grid points) there is data. In our example, let this be limited to height/direction-combinations. Namely, suppose that at height == 10 we have data only for direction in {0, 120, 240} and at height == 20 only for direction in {120, 180, 300}.

The approaches for dealing with this I see are:

Use a separate unidimensional Variable for each height/direction-combination.
Use a single three-dimensional Variable over the Cartesian product, i.e., all possible combinations, and live with the fact that for some combinations all values are masked.
Use different location dimension definitions for each height and a two-dimensional Variable for each height.

Are there other approaches and what are reasons, both principled as well as practical, for preferring one approach over another?

score 1 · Accepted Answer · answered May 16 '17 at 21:33

Basically your answer number 2 is the correct one. NETCDF files are gridded files, and so the natural structure for the data describe is to define three dimensions, time, height and direction. For the array entries for which data does not exist you need to set the data to equal the value defined by the metadata:

_FillValue

This means that any software such as R, python, ncview etc that is reading the data will assign these points as "missing".

For more details on defining missing values see: http://www.unidata.ucar.edu/software/netcdf/docs/fill_values.html

score 0 · Answer 2 · answered May 26 '17 at 12:46

When reading up on metadata conventions, I encountered another option: ‘compression by gathering’ of the height and direction variables into a single location variable.

How would this work in the toy example? First gather all locations into a one-dimensional list:

0: 10,0   *
1: 10,120 *
2: 10,180
3: 10,240 *
4: 10,300
5: 20,0
6: 20,120 *
7: 20,180 *
8: 20,240
9: 20,300 *

Then location = [0, 1, 3, 6, 7, 9] and data is defined using only two dimensions, location, which has a compress: "height direction" attribute, and time. Probably it is best to add a two-dimensional auxiliary coordinate variable to make the relationship between the location indices and the height/direction value explicit: height_direction = [(10,0), (10,120), (10,240), (20,120), (20,180), (0,300)].

Given that there seems to be no library support for this, it is not necessarily the most convenient option in all respects. However, it does seem a legitimate option to consider given that it is encoded in a metadata standard, “NetCDF Climate and Forecast (CF) Metadata Conventions”.

netCDF4: Approaches to dealing with multidimensional data with unused grid points

2 Answers2