7

I'm fairly new to using xarrays. I want to modify attributes of a NetCDF file inplace. But, the built-in function gives another new dataset instead.

ds = xr.open_dataset(file_)
# ds has "time" as one of the coordinates whose attributes I want to modify
#here is ds for more clarity
ds
>><xarray.Dataset>
Dimensions:  (lat: 361, lev: 1, lon: 720, time: 1)
Coordinates:
* lon      (lon) float32 0.0 0.5 1.0 1.5 2.0 ... 357.5 358.0 358.5 359.0 359.5
* lat      (lat) float32 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0
* lev      (lev) float32 1.0
* time     (time) timedelta64[ns] 00:00:00
Data variables:
V        (time, lev, lat, lon) float32 ...
Attributes:
Conventions:          CF
constants_file_name:  P20000101_12
institution:          IACETH
lonmin:               0.0
lonmax:               359.5
latmin:               -90.0
latmax:               90.0
levmin:               250.0
levmax:               250.0

I tried to assign new attribute but its given a new data array instead

newtimeattr = "some time" 
ds.time.assign_attrs(units=newtimeattr)

Alternatively, if I assign this attribute to the dataset variable "V", it instead adds another variable to dataset

ds['V '] = ds.V.assign_attrs(units='m/s')
## here it added another variable V .So, ds has 2 variables with same name as V
ds #trimmed output
>>Data variables:
V        (time, lev, lat, lon) float32 ...
V        (time, lev, lat, lon) float32 ...
Light_B
  • 1,660
  • 1
  • 14
  • 28

2 Answers2

8

From the xarray docs, xarray.DataArray.assign_attrs

Returns a new object equivalent to self.attrs.update(*args, **kwargs).

What this means is that this method returns a new DataArray (or coordinate) with the updated attrs, and you must assign these to the dataset in order for them to update it:

ds.coords["time"] = ds.time.assign_attrs(
    units=newtimeattr
)

As you pointed out, this can be done in place by accessing the attrs using keyword syntax:

ds.time.attrs['units'] = newtimeattr

Just a point of clarification - the reason your last statement adds a new variable is because you assigned ds.V with the updated attrs to the variable ds['V '] with a space. Since 'V ' != 'V' in python, this created a new variable and assigned it the values of the original ds.V, after updating the attributes. Otherwise, your method would have worked fine:

ds['V'] = ds.V.assign_attrs(units='m/s')
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
7
ds.V.attrs['units'] = 'm/s'

worked for me. Similarly for "time" which is a dimension

ds.time.attrs['units'] = newtimeattr
Light_B
  • 1,660
  • 1
  • 14
  • 28
  • Hmm, I am wondering, why this syntax is a standard: `ds.time.attrs['units'] = newtimeattr`. One can easily confuse 'time' with python method or python variable. I prefer syntax like `ds['time'].attrs['units'] = newtimeattr`, because here `time` is obviously declared as NC variable. Fortunatelly it works. – jurajb Dec 14 '21 at 11:01