1

I am trying to assign new coordinates to a xarray DataArray's multiIndex.

I have a dataArray which contains 2 main dimensions ('longitude', 'latitude), and a single multiindex ('states').

Here is the DataArray structe:

print(dataArray)

<xarray.DataArray (longitude: 5000, latitude: 3000)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * longitude  (longitude) float64 -145.0 -145.0 -144.9 ... -15.05 -15.03 -15.0
  * latitude   (latitude) float64 -85.0 -84.96 -84.93 ... 24.93 24.96 25.0
    states     (latitude, longitude) float64 nan nan nan nan ... nan nan nan nan

The 'states' multiindex contains solely interger values, and I would like to convert them, or add a second multiIndex with "named coordinates" (i.e.: USA, Italy, Germany, Brazil...).

Once with a named "states" multiindex, one could easily select a given state by its proper name - from the available index.

Below is a reproducible script. It has been taken from here :

import pandas as pd
pd.set_option('display.width', 50000)
pd.set_option('display.max_rows', 50000)
pd.set_option('display.max_columns', 5000)
import geopandas
from rasterio import features
from affine import Affine
import numpy as np
import xarray as xr
from cartopy.io import shapereader



def transform_from_latlon(lat, lon):
    lat = np.asarray(lat)
    lon = np.asarray(lon)
    trans = Affine.translation(lon[0], lat[0])
    scale = Affine.scale(lon[1] - lon[0], lat[1] - lat[0])
    return trans * scale

def rasterize(shapes, coords, fill=np.nan, **kwargs):
    """Rasterize a list of (geometry, fill_value) tuples onto the given
    xray coordinates. This only works for 1d latitude and longitude
    arrays.
    """
    transform = transform_from_latlon(coords['latitude'], coords['longitude'])
    out_shape = (len(coords['latitude']), len(coords['longitude']))
    raster = features.rasterize(shapes, out_shape=out_shape,
                                fill=fill, transform=transform,
                                dtype=float, **kwargs)
    return xr.DataArray(raster, coords=coords, dims=('latitude', 'longitude'))



if '__main__' == __name__:

    # this shapefile is from natural earth data
    # http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/



    resolution = '10m'
    category = 'cultural'
    name = 'admin_0_countries'

    shpfilename = shapereader.natural_earth(resolution, category, name)

    # read the shapefile using geopandas
    states = geopandas.read_file(shpfilename)

    South_America = states[states['SUBREGION'] == 'South America'].reset_index(drop=True)


    state_ids = {k: i for i, k in enumerate(South_America['NAME_LONG'])}
    shapes = [(shape, n) for n, shape in enumerate(South_America.geometry)]

    LONGITUDE = np.linspace(-145, -15, num=5000)
    LATITUDE = np.linspace(-85, 25, num=3000)


    ds = xr.DataArray(coords=(LONGITUDE, LATITUDE), dims=['longitude', 'latitude'])


    ds['states'] = rasterize(shapes, ds.coords)


    # trying to assign new coordinates to the dimension:

    try:
        ds = ds.assign_coords(states = South_America['NAME_LONG'])
    except ValueError:
        print("message error", "cannot add coordinates with new dimensions to a DataArray")



    # ds = ds.expand_dims({'names':South_America['NAME_LONG']})  # --> this does not work

    Array = np.random.randn(LATITUDE.size, LONGITUDE.size)

    dArray_Brazil = xr.DataArray(Array, coords=(LATITUDE, LONGITUDE), dims=['latitude', 'longitude'])



    import matplotlib.pyplot as plt
    quadmash = dArray_Brazil.plot()
    ax = ds.states.where(ds.states != 'Brazil').plot(ax=quadmash.axes)

    plt.show()

ideally, I would like to have a DataArray structure as one of the two options below:

option 1)

<xarray.DataArray (longitude: 5000, latitude: 3000)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * longitude  (longitude) float64 -145.0 -145.0 -144.9 ... -15.05 -15.03 -15.0
  * latitude   (latitude) float64 -85.0 -84.96 -84.93 ... 24.93 24.96 25.0
    states     (latitude, longitude) string Brazil, USA Germany ...

option 2)

 <xarray.DataArray (longitude: 5000, latitude: 3000)>
    array([[nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           ...,
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan]])
    Coordinates:
      * longitude  (longitude) float64 -145.0 -145.0 -144.9 ... -15.05 -15.03 -15.0
      * latitude   (latitude) float64 -85.0 -84.96 -84.93 ... 24.93 24.96 25.0
        states     (latitude, longitude) float64 nan nan nan nan ... nan nan nan nan
        Named_states     (latitude, longitude) string Brazil, USA Germany ...
Philipe Riskalla Leal
  • 954
  • 1
  • 10
  • 28

1 Answers1

2

Here's one way to replace coordinate values:

temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
states = [[1,2],[3,2]]

ds = xr.Dataset({'temperature': (['x', 'y', 'time'],  temp),
        'precipitation': (['x', 'y', 'time'], precip)},
        coords={'lon': (['x', 'y'], lon),
        'lat': (['x', 'y'], lat),
        'time': pd.date_range('2014-09-06', periods=3),
        'states': (['x','y'], states)})

ds
<xarray.Dataset>
Dimensions:        (time: 3, x: 2, y: 2)
Coordinates:
    lon            (x, y) float64 -99.83 -99.32 -99.79 -99.23
    lat            (x, y) float64 42.25 42.21 42.63 42.59
  * time           (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    states         (x, y) int64 1 2 3 2
Dimensions without coordinates: x, y
Data variables:
    temperature    (x, y, time) float64 1.096 19.28 16.27 ... 19.25 20.38 4.981
    precipitation  (x, y, time) float64 9.09 7.486 2.288 ... 3.639 0.6625 8.19
transdict = {'1':'Brazil', '2':'Germany', '3':'USA'} # need dictionary for all mappings
ds.states.values = ds.states.astype(str)

for key, value in transdict.items():
    ds.states.values = np.where(ds.states.values == key, value, ds.states.values)
    ds
<xarray.Dataset>
Dimensions:        (time: 3, x: 2, y: 2)
Coordinates:
    lon            (x, y) float64 -99.83 -99.32 -99.79 -99.23
    lat            (x, y) float64 42.25 42.21 42.63 42.59
  * time           (time) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08
    states         (x, y) <U21 'Brazil' 'Germany' 'USA' 'Germany'
Dimensions without coordinates: x, y
Data variables:
    temperature    (x, y, time) float64 1.096 19.28 16.27 ... 19.25 20.38 4.981
    precipitation  (x, y, time) float64 9.09 7.486 2.288 ... 3.639 0.6625 8.19
bwc
  • 1,028
  • 7
  • 18
  • Dear BWC, thank you for your return. As it seems, there is no simple way to broadcast the mapping operation. It would be great to speed things up. Nonetheless, your solution is great, indeed. It solves the problem. I will still focus on some kind of concatenation, merge, or assign_coordinate approach. – Philipe Riskalla Leal Feb 14 '20 at 23:31