1

I tried to combine two netcdf files along the time dimension using this command. The first file ran 1 to 10 days and then the next one from 11 to 20 days. Time was recorded as year/month/days in the netcdf files.

ds = xr.open_mfdataset([file1, file2], combine='nested',concat_dim=["time"])

The files have data variables that are 3-d (time, lon, lat) dimensions where lon and lat stay constant in each of the two files. The time array did expand to 20 days but data arrays remained as 10 days in the merged output file after executing the command above.

Using cdo mergetime utility, I was able to overcome the problem by doing

cdo mergetime file1.nc file2.nc mergedfile.nc 

However, I am trying to do this in a script and prefer using xarray over cdo. Any comments would help on why xarray doesn't combine non-time arrays in this scenario?

Tarandeep Kalra
  • 357
  • 1
  • 4
  • 15
  • 1
    xr.open_mfdataset is the right way to do this. can you please include the full error message and all of the code or information about the data needed for us to understand the problem? see this guide to [crafting a minimal bug report](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports). thanks! – Michael Delgado May 11 '22 at 19:48
  • There is no error. The issue is that the time vector gets added to every non-time varying variable after the xarray concatenation. It should only add to time varying arrays. That does not happen with the ```cdo``` command. – Tarandeep Kalra May 12 '22 at 20:30
  • Please edit your question to show specifically what’s happening and what you would like to happen. Xarray isn’t cdo, so you shouldn’t expect the same default behaviors. But if you provide us more information about what’s going on I'm sure we can figure out a solution! – Michael Delgado May 12 '22 at 22:12
  • In your question you say that all arrays have dimension (lat, lon, time) but your comment implies otherwise. Can you clarify the data structure of each file, e.g. by opening each file with xr.open_dataset and printing the dataset? – Michael Delgado May 12 '22 at 22:16

2 Answers2

1

You should try different options in mf_dataset. I was able to replicate your problem and solve this by changing some options:

#!/usr/bin/env
import datetime
from netCDF4 import Dataset,date2num,num2date
import numpy as np
import xarray as xr
# ----------------------
nx = ny = 10
ntime = 5;
f_a = 'test_01.nc';
with Dataset(f_a,'w','NETCDF3') as ncout:
    ncout.createDimension('x',nx)
    ncout.createDimension('y',ny)
    ncout.createDimension('time',None)
    # -------------------------------
    xv = ncout.createVariable('x','float32',('x'));xv[:]=np.linspace(0,nx,nx)
    yv = ncout.createVariable('y','float32',('y'));yv[:]=np.linspace(0,ny,ny)
    tv = ncout.createVariable('time','float64',('time'));tv[:] = np.linspace(0,ntime,ntime)*3600;tv.setncattr('units','seconds since 2022-05-12 00:00:00')
    dataout = ncout.createVariable('data_3d','float32',('time','y','x'));dataout[:]= np.random.random((ntime,ny,nx))
    dataout = ncout.createVariable('data_2d','float32',('y','x'));dataout[:]= np.random.random((ny,nx))
# ----------------------------------------------------------------------------------------------------------------------
f_b = 'test_02.nc';
with Dataset(f_b,'w','NETCDF3') as ncout:
    ncout.createDimension('x',nx)
    ncout.createDimension('y',ny)
    ncout.createDimension('time',None)
    # -------------------------------
    xv = ncout.createVariable('x','float32',('x'));xv[:]=np.linspace(0,nx,nx)
    yv = ncout.createVariable('y','float32',('y'));yv[:]=np.linspace(0,ny,ny)
    tv = ncout.createVariable('time','float64',('time'));tv[:] = np.linspace(0,ntime,ntime)*3600;tv.setncattr('units','seconds since 2022-05-13 00:00:00')
    dataout = ncout.createVariable('data_3d','float64',('time','y','x'));dataout[:]= np.random.random((ntime,ny,nx))
    dataout = ncout.createVariable('data_2d','float32',('y','x'));dataout[:]= np.random.random((ny,nx))
# ------------------------------------------------------------------------------------------------------------------------
with xr.open_mfdataset([f_a, f_b], combine='nested',concat_dim=["time"]) as ds:
    ds.to_netcdf('merged_default.nc')
# ------------------------------------------------------------------------------------------------------------------------
with xr.open_mfdataset([f_a, f_b],concat_dim='time', data_vars='minimal',combine='nested',coords='minimal',compat='override') as ds:
    ds.to_netcdf('merged_minimal.nc')

So, the "merged_default.nc" has both the original 2D and 3D variables as 3D variables but in the "merged_minimal.nc" the 2D variable is 2D and 3D variable is 3D.

msi_gerva
  • 2,021
  • 3
  • 22
  • 28
1

You could do this in Python using my package nctoolkit. This uses CDO as a backend, so will do the same thing as your CDO option.

import nctoolkit as nc
ds = nc.open_data([file1, file2])
ds.merge("time")
# convert to xarray object if neede
ds_xr = ds.to_xarray()
Robert Wilson
  • 3,192
  • 11
  • 19