0

So what I am doing is downloading data from a data portal separated into variables, months and years (because it's faster to get the data this way). I have the data available on my drive and now want to stich it together. I did this successfully and now want to save the entire dataset ds with all variables.

What I will describe in the following will happen to ALL variables!

In the preamble, I load

import xarray as xr
import os
import numpy as np

After stitching and everything, I look at the data and it looks reasonable

So I save the dataset with ds.to_netcdf('Data.nc')

If I reopen the data with xr.open_dataset('Data.nc') again, the data is altered and does not go beyond certain values. I attached an image of this below.

Does anyone know, what is happening here and how to solve this?!?!?

P.S.: I am using Jupyter Notebook on macOs, if that is of importance?!

Data alters after saving

EDIT: Output of ncdump -hs Data.nc is:

netcdf Data {
dimensions:
    time = 561024 ;
    longitude = 3 ;
    latitude = 3 ;
variables:
    int64 time(time) ;
        time:long_name = "time" ;
        time:units = "hours since 1959-01-01 00:00:00" ;
        time:calendar = "proleptic_gregorian" ;
        time:_Storage = "contiguous" ;
        time:_Endianness = "little" ;
    float longitude(longitude) ;
        longitude:_FillValue = NaNf ;
        longitude:units = "degrees_east" ;
        longitude:long_name = "longitude" ;
        longitude:_Storage = "contiguous" ;
        longitude:_Endianness = "little" ;
    float latitude(latitude) ;
        latitude:_FillValue = NaNf ;
        latitude:units = "degrees_north" ;
        latitude:long_name = "latitude" ;
        latitude:_Storage = "contiguous" ;
        latitude:_Endianness = "little" ;
    short mpww(time, latitude, longitude) ;
        mpww:_FillValue = -32767s ;
        mpww:units = "s" ;
        mpww:long_name = "Mean period of wind waves" ;
        mpww:add_offset = 2.79603913709583 ;
        mpww:scale_factor = 3.90353786451101e-05 ;
        mpww:missing_value = -32767s ;
        mpww:_Storage = "contiguous" ;
        mpww:_Endianness = "little" ;
    short shts(time, latitude, longitude) ;
        shts:_FillValue = -32767s ;
        shts:units = "m" ;
        shts:long_name = "Significant height of total swell" ;
        shts:add_offset = 1.18743369983622 ;
        shts:scale_factor = 1.05300150544382e-05 ;
        shts:missing_value = -32767s ;
        shts:_Storage = "contiguous" ;
        shts:_Endianness = "little" ;
    short pp1d(time, latitude, longitude) ;
        pp1d:_FillValue = -32767s ;
        pp1d:units = "s" ;
        pp1d:long_name = "Peak wave period" ;
        pp1d:add_offset = 12.2260785916261 ;
        pp1d:scale_factor = 0.000189618505657455 ;
        pp1d:missing_value = -32767s ;
        pp1d:_Storage = "contiguous" ;
        pp1d:_Endianness = "little" ;
    short hmax(time, latitude, longitude) ;
        hmax:_FillValue = -32767s ;
        hmax:units = "m" ;
        hmax:long_name = "Maximum individual wave height" ;
        hmax:add_offset = 2.23715532703722 ;
        hmax:scale_factor = 1.92943508559216e-05 ;
        hmax:missing_value = -32767s ;
        hmax:_Storage = "contiguous" ;
        hmax:_Endianness = "little" ;
    short mpts(time, latitude, longitude) ;
        mpts:_FillValue = -32767s ;
        mpts:units = "s" ;
        mpts:long_name = "Mean period of total swell" ;
        mpts:add_offset = 8.83459024768542 ;
        mpts:scale_factor = 7.28539333599922e-05 ;
        mpts:missing_value = -32767s ;
        mpts:_Storage = "contiguous" ;
        mpts:_Endianness = "little" ;
    short swh(time, latitude, longitude) ;
        swh:_FillValue = -32767s ;
        swh:units = "m" ;
        swh:long_name = "Significant height of combined wind waves and swell" ;
        swh:add_offset = 1.19637698437532 ;
        swh:scale_factor = 1.04207642782417e-05 ;
        swh:missing_value = -32767s ;
        swh:_Storage = "contiguous" ;
        swh:_Endianness = "little" ;
    short shww(time, latitude, longitude) ;
        shww:_FillValue = -32767s ;
        shww:units = "m" ;
        shww:long_name = "Significant height of wind waves" ;
        shww:add_offset = 0.457024275937314 ;
        shww:scale_factor = 1.394812537195e-05 ;
        shww:missing_value = -32767s ;
        shww:_Storage = "contiguous" ;
        shww:_Endianness = "little" ;

// global attributes:
        :Conventions = "CF-1.6" ;
        :history = "2023-01-05 17:41:27 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/data3/adaptor.mars.internal-1672940486.4022062-20190-2-1c3b422b-fb80-4c15-b960-bcf6e7f0c58a.nc /cache/tmp/1c3b422b-fb80-4c15-b960-bcf6e7f0c58a-adaptor.mars.internal-1672940458.202113-20190-2-tmp.grib" ;
        :_NCProperties = "version=1|netcdflibversion=4.6.1|hdf5libversion=1.10.6" ;
        :_SuperblockVersion = 0 ;
        :_IsNetcdf4 = 1 ;
        :_Format = "netCDF-4" ;
}

The output of ds.swh.encoding is different between the saved an loaded version:

Original

{'source': '/1959_1.nc',
 'original_shape': (744, 3, 3),
 'dtype': dtype('int16'),
 'missing_value': -32767,
 '_FillValue': -32767,
 'scale_factor': 1.0420764278241716e-05,
 'add_offset': 1.1963769843753225}

New Version

{'zlib': False,
 'shuffle': False,
 'complevel': 0,
 'fletcher32': False,
 'contiguous': True,
 'chunksizes': None,
 'source': '/Users/cgdavid/Documents/01-Forschung/01-Paper/Plate_Breakwater/New_Copernicus/Data.nc',
 'original_shape': (561024, 3, 3),
 'dtype': dtype('int16'),
 'missing_value': -32767,
 '_FillValue': -32767,
 'scale_factor': 1.0420764278241716e-05,
 'add_offset': 1.1963769843753225}

I have to say, that the original version only shows a small piece. So I downloaded Climate Data at Copernicus Climate Data Store in monthly bits for each variable. I then combine the months to a long time series via xr.concat([ds1,ds2],'time') and then merge the variables via xr.merge([DS1,DS2])...

Gabriel
  • 67
  • 5
  • You might take a look at the variable encoding (`ds.swh.encoding` and `ds.swh.attrs`). The first place I would look is to see if there are any fields related to masking, scaling, or valid_range. You might also add the output of `ncdump -hs Data.nc` to this post which would help diagnose things. – jhamman Jan 06 '23 at 16:35
  • 1
    Thanks for your response @jhamman - I added the output to the post. Output of `ds.swh.attrs` is the same for both, but the encoding is... strange... I will also add it in the original post... – Gabriel Jan 10 '23 at 11:13

1 Answers1

0

When using Xarray's open_mfdataset or concatenate, the first dataset's encoding is given to the concatenated dataset. This sometimes causes problems. One way to avoid these problems are to specify your output encoding or let Xarray choose for you.

import xarray as xr


ds = xr.open_mfdataset(...) # or other concat/merge steps

# reset variable encoding
for v in ds:
    ds[v].encoding = {}

ds.to_netcdf(...)
jhamman
  • 5,867
  • 19
  • 39