3

I work with satellite data organised on an irregular two-dimensional grid whose dimensions are scanline (along track dimension) and ground pixel (across track dimension). Latitude and longitude information for each ground pixel are stored in auxiliary coordinate variables.

Given a (lat, lon) point, I would like to identify the closest ground pixel on my set of data.

Let's build a 10x10 toy data set:

import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
%matplotlib inline

lon, lat = np.meshgrid(np.linspace(-20, 20, 10), 
                       np.linspace(30, 60, 10))
lon += lat/10
lat += lon/10
da = xr.DataArray(data = np.random.normal(0,1,100).reshape(10,10), 
                  dims=['scanline', 'ground_pixel'],
                  coords = {'lat': (('scanline', 'ground_pixel'), lat),
                            'lon': (('scanline', 'ground_pixel'), lon)})

ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'), 
                   infer_intervals=True);
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()

europe

Note that the lat/lon coordinates identify the centre pixel and the pixel boundaries are automatically inferred by xarray.

Now, say I want to identify the closest ground pixel to Rome.

The best way I came up with so far is to use a scipy's kdtree on a stacked flattened lat/lon array:

from scipy import spatial
pixel_center_points = np.stack((da.lat.values.flatten(), da.lon.values.flatten()), axis=-1)
tree = spatial.KDTree(pixel_center_points)

rome = (41.9028, 12.4964)
distance, index = tree.query(rome)
print(index)
# 36

I have then to apply unravel_index to get my scanline/ground_pixel indexes:

pixel_coords = np.unravel_index(index, da.shape)
print(pixel_coords)
# (3, 6)

Which gives me the scanline/ground_pixel coordinates of the (supposedly) closest ground pixel to Rome:

ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'), 
                   infer_intervals=True);
ax.scatter(da.lon[pixel_coords], da.lat[pixel_coords], 
           marker='x', color='r', transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()

enter image description here

I'm convinced there must me a much more elegant way to approach this problem. In particular, I would like to get rid of the flattening/unraveling steps (all my attempts to build a kdtree on a two-dimensional array failed miserably), and make use of my xarray dataset's variables as much as possible (adding a new centre_pixel dimension for example, and using it as input to KDTree).

jhamman
  • 5,867
  • 19
  • 39
stm4tt
  • 755
  • 1
  • 5
  • 22

1 Answers1

6

I am going to answer my own question as I believe I came up with a decent solution, which is discussed at much greater length on my blog post on this subject.

Geographical distance

First of all, defining the distance between two points on the earth's surface as simply the euclidean distance between the two lat/lon pairs could lead to inaccurate results depending on the distance between two points. It is thus better to transform the coordinates to ECEF coordinates first and built a KD-Tree on the transformed coordinates. Assuming points on the surface of the planet (h=0) the coordinate transformation is done as such:

def transform_coordinates(coords):
    """ Transform coordinates from geodetic to cartesian

    Keyword arguments:
    coords - a set of lan/lon coordinates (e.g. a tuple or 
             an array of tuples)
    """
    # WGS 84 reference coordinate system parameters
    A = 6378.137 # major axis [km]   
    E2 = 6.69437999014e-3 # eccentricity squared    

    coords = np.asarray(coords).astype(np.float)

    # is coords a tuple? Convert it to an one-element array of tuples
    if coords.ndim == 1:
        coords = np.array([coords])

    # convert to radiants
    lat_rad = np.radians(coords[:,0])
    lon_rad = np.radians(coords[:,1]) 

    # convert to cartesian coordinates
    r_n = A / (np.sqrt(1 - E2 * (np.sin(lat_rad) ** 2)))
    x = r_n * np.cos(lat_rad) * np.cos(lon_rad)
    y = r_n * np.cos(lat_rad) * np.sin(lon_rad)
    z = r_n * (1 - E2) * np.sin(lat_rad)

    return np.column_stack((x, y, z))

Building the KD-Tree

We could then build the KD-Tree by transforming our dataset coordinates, taking care of flattening the 2D grid to a one-dimensional sequence of lat/lon tuples. This is because the KD-Tree input data needs to be (N,K), where N is the number of point and K is the dimensionality (K=2 in our case, as we assume no heigth component).

# reshape and stack coordinates
coords = np.column_stack((da.lat.values.ravel(),
                          da.lon.values.ravel()))

# construct KD-tree
ground_pixel_tree = spatial.cKDTree(transform_coordinates(coords))

Querying the tree and indexing the xarray dataset

Querying the tree is now as simple as transforming our point's lat/lon coordinates to ECEF and passing those to the tree's query method:

rome = (41.9028, 12.4964)
index = ground_pixel_tree.query(transform_coordinates(rome))

In doing so though, we need to unravel our index on the original dataset's shape, to get the scanline/ground_pixel indexes:

index = np.unravel_index(index, self.shape)

We could now use the two components to index our original xarray dataset, but we could also build two indexers to use with xarray pointwise indexing feature:

index = xr.DataArray(index[0], dims='pixel'), \
        xr.DataArray(index[1], dims='pixel')

Getting the closest pixel is now easy and elegant at the same time:

da[index]

Note that we could also query more than one point at once, and by building the indexers as above, we could still index the dataset with a single call:

da[index]

Which would then return a subset of the dataset containing the closest ground pixels to our query points.

Further readings

  • Using the euclidean norm on the lat/lon tuples could be accurate enough for smaller distance (thing of it as approximating the earth as flat, it works on a small scale). More details on geographic distances here.
  • Using a KD-Tree to find the nearest neighbour is not the only way to address this problem, see this very comprehensive article.
  • An implementation of KD-Tree directly into xarray is in the pipeline.
  • My blog post on the subject.
stm4tt
  • 755
  • 1
  • 5
  • 22
  • 2
    As you've noted, this feature is within xarray's scope but is just waiting for an interested user/developer to implement it. It looks like you're well on your way to making that happen. I encourage you to submit a PR to the xarray project if you'd like to see this feature in the package. – jhamman Jan 02 '18 at 04:59