How to speed up calculating geometry area of 2D boundary arrays?

Question

Background

I have four 2D arrays: lon_centers, lat_centers, lon_bnds, and lat_bnds. bnds means the boundary of each pixel and centers stands for the center of pixels.

Data overview

import numpy as np
import matplotlib.pyplot as plt

lon_bnds = np.array([[-77.9645  , -77.56074 , -77.162025, -76.76827 , -76.37937 ],
                    [-77.88815 , -77.48613 , -77.08915 , -76.69711 , -76.30993 ],
                    [-77.811676, -77.41139 , -77.01614 , -76.62582 , -76.24034 ],
                    [-77.73638 , -77.337814, -76.944275, -76.55565 , -76.17186 ],
                    [-77.66197 , -77.265114, -76.87326 , -76.48632 , -76.1042  ]])

lat_bnds = np.array([[-77.34674 , -77.35804 , -77.36858 , -77.378395, -77.38752 ],
                    [-77.28847 , -77.299614, -77.31001 , -77.31969 , -77.328674],
                    [-77.23022 , -77.24122 , -77.25147 , -77.26101 , -77.26986 ],
                    [-77.17193 , -77.182785, -77.192894, -77.20229 , -77.211006],
                    [-77.11363 , -77.12434 , -77.13431 , -77.14357 , -77.15215 ]])

lon_centers = np.array([[-77.72404 , -77.323685, -76.92833 , -76.53787 ],
                        [-77.64892 , -77.2503  , -76.85666 , -76.46791 ],
                        [-77.57335 , -77.17646 , -76.78454 , -76.3975  ],
                        [-77.499626, -77.10444 , -76.71421 , -76.32886 ]])

lat_centers = np.array([[-77.32333 , -77.334175, -77.344284, -77.353676],
                        [-77.264946, -77.27564 , -77.28561 , -77.29487 ],
                        [-77.20665 , -77.21719 , -77.22702 , -77.23614 ],
                        [-77.14826 , -77.15867 , -77.16835 , -77.17734 ]])

plt.scatter(lon_bnds, lat_bnds, label='corner')
plt.scatter(lon_centers, lat_centers, marker='x', c='r', label='center')
plt.legend()

Goal

The goal is to calculate the area (m2) of each pixel, which means area has the same shape as lon_centers and lat_centers. I found this useful question for calculating the area of one geo-polygon. However, it needs inputs in order and loop of the 2D arrays.

Attempt

from pyproj import Geod

geod = Geod(ellps="WGS84")

areas = []

for x in range(lon_bnds.shape[0]-1):
    for y in range(lon_bnds.shape[1]-1):
        lons = lon_bnds[x:x+2, y:y+2].ravel()
        lats = lat_bnds[x:x+2, y:y+2].ravel()
        lons[-2], lons[-1] = lons[-1], lons[-2]
        lats[-2], lats[-1] = lats[-1], lats[-2]

        poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
        areas.append(poly_area)

areas = np.asarray(areas).reshape(lon_centers.shape)

The for loop is too slow if there're more pixels. Is it possible to speed it up?

You've provided no actual data and shown no attempt, i.e., no [mcve]. Asking for library recommendations is off topic. Please see [ask] and related help pages for more details. — Mad Physicist, May 17 '22 at 13:58
Post some sample data. I don't have access to your npy files and I shouldn't need them to plot 16+25 points. — Mad Physicist, May 17 '22 at 16:06
@MadPhysicist Got it. Switched to the small sample data now. — zxdawn, May 17 '22 at 16:24
Have you considered `numba` or caching it, for eg. `lru_cache`? From what I can see is that the arrays are being run in the for loop for multiple times. Caching it will be good in reducing unnecessary overload — , May 17 '22 at 16:30
@KevinChoonLiangYew I tried numba before, but it doesn't support pyproj. Glad that `lru_cache` works. — zxdawn, May 17 '22 at 20:48

score 1 · Answer 1 · answered May 17 '22 at 16:40

Here's something you can consider when you are doing multiple for loops onto arrays.

# Without caching
def function1():
    geod = Geod(ellps="WGS84")

    areas = []

    for x in range(lon_bnds.shape[0]-1):
        for y in range(lon_bnds.shape[1]-1):
            lons = lon_bnds[x:x+2, y:y+2].ravel()
            lats = lat_bnds[x:x+2, y:y+2].ravel()
            lons[-2], lons[-1] = lons[-1], lons[-2]
            lats[-2], lats[-1] = lats[-1], lats[-2]

            poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
            areas.append(poly_area)
    return areas

# With caching
from functools import lru_cache

@lru_cache(maxsize=512)
def function2():
    geod = Geod(ellps="WGS84")

    areas = []

    for x in range(lon_bnds.shape[0]-1):
        for y in range(lon_bnds.shape[1]-1):
            lons = lon_bnds[x:x+2, y:y+2].ravel()
            lats = lat_bnds[x:x+2, y:y+2].ravel()
            lons[-2], lons[-1] = lons[-1], lons[-2]
            lats[-2], lats[-1] = lats[-1], lats[-2]

            poly_area, poly_perimeter = geod.polygon_area_perimeter(lons, lats)
            areas.append(poly_area)
    return areas

%timeit function1()
229 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit function2()
84.3 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

If your arrays are really large, remember to make sure the `maxsize` for lru_cache is sufficient enough to store your result. So in the end, you can perform checking to see if the size of your areas vector matches the size of your data. — , May 17 '22 at 16:43
What's the principle of `maxsize`? Is it possible to set it based on the shape automatically? — zxdawn, May 17 '22 at 20:49
First run is sometimes slower. You're not caching anything with one function call, and you could have written `function2 = lru_cache(function1)` without copying the whole thing. — Mad Physicist, May 17 '22 at 21:32
@zxdawn I guess I don't have a definite answer for that, but you can always initialize the maxsize to be a large enough value 2^10 — , May 17 '22 at 22:58

Mad Physicist · Answer 2 · 2022-05-17T17:30:46.150

It's very convenient that you have the points arranged in a grid so you know how they match up. The general formula for the area of a planar quadrilateral given a list of points arranged in a counter-clockwise direction is:

A = 0.5 * ((x1 * y2 + x2 * y3 + x3 * y4 + x4 * y1) -
           (x2 * y1 + x3 * y2 + x4 * y3 + x1 * y4))

This won't do you much good if x and y are in degrees, but you can convert them to meters on a flat local approximation of the surface for a pretty good estimate. To do that, you need to know the local radius of the Earth, R. You can use the average radius of 6371km, or you can use R = sqrt((R1**2 * cos(lat))**2 + (R2**2 * sin(lat))**2) / ((R1 * cos(lat))**2 + (R2 * sin(lat))**2), with R1 = 6378.137km and R2 = 6356.752. Let's use the mean lat/lon of the data as the reference point:

R1 = 6378.137
R2 = 6356.752

lons = np.deg2rad(lon_bnds)
lats = np.deg2rad(lat_bnds)
R = np.sqrt(((R1**2 * np.cos(lats))**2 + (R2**2 * np.sin(lats))**2) / ((R1 * np.cos(lats))**2 + (R2 * np.sin(lats))**2))
lon_ref = np.mean(lons, axis=None)
lat_ref = np.mean(lats, axis=None)
x = R * (lons - lon_ref) * np.cos(lats)
y = R * (lats - lat_ref)

Here is a plot similar to your original, but converted to meters:

lon_c = np.deg2rad(lon_centers)
lat_c = np.deg2rad(lat_centers)
Rc = np.sqrt(((R1**2 * np.cos(lat_c))**2 + (R2**2 * np.sin(lat_c))**2) / ((R1 * np.cos(lat_c))**2 + (R2 * np.sin(lat_c))**2))
xc = Rc * (lon_c - lon_ref) * np.cos(lat_c)
yc = Rc * (lat_c - lat_ref)
plt.scatter(x, y)
plt.scatter(xc, yc, marker='x', c='r')

A = 0.5 * ((x[1:, :-1] * y[:-1, :-1] + x[1:, 1:] * y[1:, :-1] + x[:-1, 1:] * y[1:, 1:] + x[:-1, :-1] * y[:-1, 1:]) - (x[:-1, :-1] * y[1:, :-1] + x[1:, :-1] * y[1:, 1:] + x[1:, 1:] * y[:-1, 1:] + x[:-1, 1:] * y[:-1, :-1]))

The result is

array([[65.80938398, 64.91795373, 64.04493782, 63.19815103],
       [65.77719064, 64.87959934, 64.01155424, 63.16247495],
       [65.77861439, 64.87956589, 64.01289216, 63.16872493],
       [65.755155  , 64.85808632, 63.98664385, 63.14105705]])

Units are square kilometers.

For larger quadrangles, you may want to project each one individually about its center. In that case, you can do something like this:

def xy(s1, s2):
    x = R[s1, s2] * (lons[s1, s2] - lon_c) * np.cos(lats[s1, s2])
    y = R[s1, s2] * (lats[s1, s2] - lat_c)
    return x, y
x1, y1 = xy(slice(None, -1), slice(None, -1))
x2, y2 = xy(slice(None, -1), slice(1, None))
x3, y3 = xy(slice(1, None), slice(1, None))
x4, y4 = xy(slice(1, None), slice(None, -1))
A = 0.5 * ((x1 * y2 + x2 * y3 + x3 * y4 + x4 * y1) -
           (x2 * y1 + x3 * y2 + x4 * y3 + x1 * y4))

In this case, the result is similar:

array([[65.80950493, 64.9180908 , 64.04508934, 63.19831548],
       [65.77721436, 64.87964039, 64.01161095, 63.16254575],
       [65.77854026, 64.87951032, 64.01285345, 63.16870148],
       [65.75498209, 64.85793324, 63.98650883, 63.14093847]])

The areas computed this way are almost identical:

array([[65.80950493, 64.9180908 , 64.04508934, 63.19831548],
       [65.77721436, 64.87964039, 64.01161095, 63.16254575],
       [65.77854026, 64.87951032, 64.01285345, 63.16870148],
       [65.75498209, 64.85793324, 63.98650883, 63.14093847]])

The differences are just a few tens of square meters:

array([[-1.20941134e-04, -1.37064710e-04, -1.51528127e-04, -1.64456803e-04],
       [-2.37217200e-05, -4.10480434e-05, -5.67132960e-05, -7.08008247e-05],
       [ 7.41315753e-05,  5.55721762e-05,  3.87051454e-05,  2.34522307e-05],
       [ 1.72913145e-04,  1.53083044e-04,  1.35020126e-04,  1.18583183e-04]])

I tested it with this example: `lon_bnds = np.array([[-72.11712, -72.05148], [-72.12832, -72.06271]]); lat_bnds = np.array([[-15.524176, -15.503015], [-15.46161 , -15.440437]])`. The difference can be 0.27 km2. — zxdawn, May 17 '22 at 21:03
Here's the example with large difference (97.24525 v.s. 98.33767[pyproj]): `lon_bnds=np.array([[-147.48647, -147.45734], [-147.67628, -147.6483 ]]; lat_bnds=np.array([[70.50656 , 70.629135],[70.52047 , 70.64307 ]]); lon_center=-147.56708; lat_center=70.575165;` — zxdawn, May 17 '22 at 22:32
@zxdawn. I can believe that: the formulae here are a crappy approximation of Easting and Northing. You are getting ~1% accuracy, which is not that bad considering what you're doing to the trig. I'd have to do a bit of research on ellipsoids with a pen and paper to figure out how to compute the surface of an elliptical patch, or better yet, you can do it! — Mad Physicist, May 17 '22 at 23:03

How to speed up calculating geometry area of 2D boundary arrays?

Background

Data overview

Goal

Attempt

2 Answers2