0

I'm working with an xarray Dataset with four dimensions (lon, lat, lev, time) and several variables holding information on the concentration of aerial pollutants (e.g. the data variable CO has the concentration of CO at some (time, lev, lat, long) time and location). See below:

<xarray.Dataset>
Dimensions:            (lon: 1440, lat: 721, lev: 1, time: 2160)
Coordinates:
  * lon                (lon) float64 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * lat                (lat) float64 -90.0 -89.75 -89.5 ... 89.5 89.75 90.0
  * lev                (lev) float64 72.0
  * time               (time) datetime64[ns] 2018-01-01T00:30:00 ... 2018-01-...
Data variables:
    CO                 (time, lev, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    NO2                (time, lev, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>

I'm trying to do two things:

  1. Iterate over this Dataset's time dimension and access a specific variable, NO2, at each timestep. This variable contains a set of floats with dimensions (1, 1, 721, 1440) for (time, lev, lat, long) in a single timestep. I want to determine the mean and standard deviation of all the NO2 values within that timestep and then apply a normalization formula to every NO2 value, changing the values. I then want to move to the next timestep and repeat this process.

  2. Once all NO2 values have been appropriately modified, I want to export the Dataset as a .nc file.

At first, I used an array to hold the normalized values (see below code snippet):

norm_NO2 = []

for i in range(0, timesteps - 1):
    NO2 = data['NO2'].isel(time=i).compute().data.flatten()
    timestep_mean = np.mean(NO2)
    timestep_SD = np.std(NO2)
    timestep_normalized = (1 / timestep_SD) * (np.log(NO2 + 10 ** -32)) - timestep_mean
    norm_NO2.append(timestep_normalized)

However, I wasn't able to figure out a way to convert the array into a .nc file (either by converting array -> Dataset -> .nc, or array -> .nc).

I also tried iterating over the Dataset itself with:

for time in data.transpose("time", ...):

As my loop, but didn't get very far on that front.

I'd definitely appreciate some guidance on either of these (or a separate) approach! So long as I normalize the NO2 values at each timestep and export as a .nc, anything works. I'm brand new to working with xarray and wasn't finding anything that answered this specific question in the docs, so I've been struggling to determine the best way to go about this.

1 Answers1

0

I am not sure if I properly understood your question but I suggest some ideas. It seems that what you want to achieve could be done using xarray matrix operations instead of iterating through time.

# Compute the mean along specific dimensions:   
NO2_mean = data['NO2'].mean(dim = ['lon', 'lat', 'lev'], keep_attrs = True) # NO2_mean is therefore a spatial mean, only varying along time
# Compute the STD along specific dimensions:
NO2_sd = data['NO2'].std(dim = ['lon', 'lat', 'lev'], keep_attrs = True)
# Process the full data, using spatial means and std previously obtained:
NO2_normalized = (1 / NO2_sd) * (np.log(data['NO2'] + 10 ** -32)) - NO2_mean

Note that in your case, as you are working with a lat/lon grid, you may want to do a weighted spatial mean (taking into account that cells have a different area depending on their latitude). See also this answer and this documentation.

Then you can include this xarray.Dataarray in your dataset: data['NO2_norm'] = NO2_normalized, or in a new dataset: new_ds = xr.Dataset({'NO2_norm': NO2_normalized})

Alex
  • 26
  • 5