0

i tried combine more than 300 NetCDF files into one with Xarray. But it is running over three days and the final NetCDF file has about 5 GB. All single NetCDF files have about 1.5 GB. Can you help me how combine these NetCDF files into one with this structure?

<xarray.Dataset>
Dimensions:  (lat: 124, lon: 499, time: 79)
Coordinates:
  * lat      (lat) float64 50.96 50.96 50.97 50.97 ... 51.27 51.27 51.27 51.27
  * lon      (lon) float64 16.52 16.53 16.53 16.53 ... 17.77 17.77 17.77 17.77
  * time     (time) datetime64[ns] 2015-07-10 2015-07-22 ... 2017-08-10
Data variables:
    vel      (lat, lon) float64 ...
    coh      (lat, lon) float64 ...
    cum      (time, lat, lon) float64 ...

Process finished with exit code 0

I tried it with whis code, but it still running (more than 3 days) and final file have over 5 GB.

import netCDF4
import numpy
import xarray
import dask    

dask.config.set({"array.slicing.split_large_chunks": False})

ds = xarray.open_mfdataset('../data/all-nc/*.nc', combine='nested', concat_dim="time")

ds.to_netcdf('../data/all-nc-files.nc')

Thank you a lot!

1 Answers1

0

You might want to try this with nctoolkit, which uses CDO as a backend. This will probably be faster:

ds = nc.open_data('../data/all-nc/*.nc')
ds.merge("time")
ds.to_nc('../data/all-nc-files.nc', zip = True)

Note: Though I am not sure why you are merging files, which are already very large, into an even larger file. If these are zipped netCDF files, then you will end up with a single file of over 300 GB. I have worked with a lot of netCDF data in my time, but I have never seen anyone produce a file that large. It is almost certainly more efficient to simply leave the files as they are instead of merging them.

Robert Wilson
  • 3,192
  • 11
  • 19