1

I first concatenated multiple DataArrays along the dimonsion of time, and then I merged a couple of meteorological variables into one Xarray.DataArray. Finally, I saved the DataArray into a netCDF file. It was strange, that the data changed after it was saved. Before it was saved, it looked normal:

Time series before the DataArray was saved to a netCDF file.

However, when I tried to load the netCDF I created, the data looked like:

Time series when the netCDF was loaded.

In order to quickly reproduce the problem, I sliced the input files to make them smaller. They together with a python script can be accessed at here (input_data_and_codes). To demonstrate, codes are as follows:

import xarray as xr

dirin = 'C:/mydata/' # Input directory
dirout = dirin # Output directory
vars = ['2m_dewpoint_temperature','2m_temperature'] # Two variables
yrs = [2009,2010] # Two years

var_datasets = []

for var in vars:
    datasets = []
    for yr in yrs:
        input_file_name = dirin + 'test_' + var + '_' + str(yr) + '.nc' 
        f1 = xr.open_dataset(input_file_name)
        datasets.append(f1)
    ds_combined = xr.concat(datasets, dim = 'time') # Concatenate data along time
    var_datasets.append(ds_combined) # Merge two variables into one DataArray
    
data_before = xr.merge(var_datasets)

data_before.to_netcdf(dirout + 'test_vars.nc')
data_after = xr.open_dataset(dirout + 'test_vars.nc')

# There are 2 variables: 't2m' and 'd2m'; plot to check
data_before.t2m.plot()
data_after.t2m.plot()

What could be the possible reasons? Thank you.

Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76
CrayonAki
  • 11
  • 2
  • I suspect this has something to do with how Xarray aligns the data in your merge/concat steps. However, your example has a bit too much going on and relies on data that we do not have. Can you try to boil this down into something smaller? Does this happen to all variables? If not, perhaps that part can be dropped from the example? Can you create a sample dataaset that reproduces the problem? – jhamman Apr 07 '23 at 20:47
  • @jhamman I tested one coordinate and found 6 out of 7 of the variables have changed, but at different timestamps. In order to boil down the input data, I selected two variables and sliced them. The post has been updated. A python script and input data can now be accessed. Thanks! – CrayonAki Apr 10 '23 at 09:16

1 Answers1

0

Are those extra columns on the second plot real data? I am guessing where those columns appear are where there are neither data nor coordinates. I ran into a similar problem artificial grids after xarray.where(drop=True). My current guessing is, xarray.plot() still shade empty values without coordinates if the empty coordinates are within a larger map.