How to merge different shaped netcdf4 files?

Question

I am storing weather forecasts as netcdf4 files. These netcdf4 files are batched following the google maps tiles principle. This means I define a zoom level (here 6) to get the extent of each tile. Based on that information I used the following code to slice the array:

    sliced_data = data.where(
        (data[lat_coord_name] <= maxLat)
        & (data[lat_coord_name] > minLat)
        & (data[lon_coord_name] <= maxLon)
        & (data[lon_coord_name] > minLon),
        drop=True,
    )

Here, data is a xarray.Dataset. At the end of this process I have 36 tiles for a weather model covering middle europe.

My problem is to merge them back to the native untiled xarray.Dataset. The projection of the weather model differs from the projection of the tile maps. So at the end I have netcdf4 files with different shapes in x and y dimension. So I have no axis to align them with xarray.

The dimension of the native grid is 340x340. You can find a test dataset here

My expectation was:

import glob
import xarray

file_list = glob.glob('test_data_stackoverflow/*')
file_list.sort()
dataset = xarray.open_mfdataset(file_list, engine="h5netcdf")

But this will fail due to different shaped datasets.

I am open using other tools like netcdf4, h5netcdf or cdo. But the data should not be manipulated e.g. with an interpolation to the origin grid.

score 1 · Answer 1 · answered Apr 14 '22 at 21:00

Combining datasets with the same dimension names but different dimension sizes is not possible in an Xarray Dataset. But it is possible in a new type of Xarray data structure, currently under development, called a DataTree. Currently DataTree lives in a separate package - https://xarray-datatree.readthedocs.io/en/latest/ - but the plan is to merge it into Xarray proper soon. DataTree is used by the library ndpyramid to store multi-scale array data, very similar to the use case you are describing.

I would explore combing your datasets into a single DataTree object. First organize your data into a dict, and then create a DataTree from the dict. You will need to decide how to encode the level of the hierarchy. The simplest is to just use an integer for each zoom level, e.g.

data_dict = {level: ds for level, ds in enumerage(file_list)}
dt = DataTree.from_dict(data_dict)

They ndpyramid code might be a useful reference: https://github.com/carbonplan/ndpyramid/blob/main/ndpyramid/core.py

score -1 · Answer 2 · answered Feb 02 '22 at 14:37

-1

You can probably solve this using CDO's merge method:

cdo merge test_data_stackoverflow/* out.nc

If the 36 tiles make up a 6 x 6 grid, then mergegrid can potentially merge them:

cdo mergegird test_data_stackoverflow/* out.nc

answered Feb 02 '22 at 14:37

Robert Wilson

3,192
11
19

Output for merge ```cdo merge: Open failed on >test_data_stackoverflow/harmonie_knmi_2021_03_20_18_35_23.nc< Unsupported file structure``` – dl.meteo Feb 02 '22 at 14:57
Output for mergrid: ```cdo (Abort): Unprocessed Input, could not process all Operators/Files``` – dl.meteo Feb 02 '22 at 14:57
cdo does not work either. – dl.meteo Feb 02 '22 at 14:58

How to merge different shaped netcdf4 files?

2 Answers2