2

I am working with some xarray data arrays which have some data at particular latitude/longitude coordinates.

For each lat/long coordinate pair in data array 1 (da1) I want to find the closest lat/long pair in data array 2 (da2).

Following this StackOverflow answer a solution that seems to work is:


lats = xr.DataArray(da1.latitude.data, dims='z') #'z' is an arbitrary name placeholder
lons = xr.DataArray(da1.longitude.data, dims='z')
data = da2.sel(latitude = lats, longitude = lons, method = 'nearest') 

This returns the data dataset which has the same length as da1.

My questions are:

  • How does the nearest method trade off "nearness" in each of the latitude and longitude coordinates?

For example, one can imagine a case where the match in the longitude is very close, and the match in latitude is a bit worse, compared with the opposite case where the match in the longitude is not so good close, but the match in latitude is very close. By what metric does the 'nearest' method judge this?

  • When setting a tolerance, does this tolerance apply to the latitude and longitude separately?

  • What is the default tolerance?

user1887919
  • 829
  • 2
  • 9
  • 24

1 Answers1

3

xarray's selection algorithms do work for each dimension of the data independently. the 'nearness' matching is handled by each index's query method; most indices in xarray are various types of Pandas indices wrapped by an xr.core.indexes.PandasIndex object, the query method of which simply calls the underlying pandas Index object's get_loc method. From that pandas API reference:

nearest: use the NEAREST index value if no exact match. Tied distances are broken by preferring the larger index value.

Note that this matching is done in cartesian space (e.g., just based on the numbers). So, even leaving aside your point about multiple dimensions, you could have errors introduced simply by using nearest-neighbor matching along a single dimension if your x and y coordinates don't map linearly to physical distances (e.g. if your x and y points represent pixels on some projection).

This works great for many applications where the approximately nearest neighbor is fine, or if you are happy working in cartesian space. If not, you should probably use a geospatial library or some other library that is explicitly handling the coordinate space that you are working in to find the coordinates of the nearest point(s).

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54