0

BLUF::

I'm having trouble computing zonal statistics with a rotated array using the rasterstats package. I'm guessing the problem is with my affine matrix, but I'm not completely sure. Below is the affine transform matrix and output:

| 951.79, 0.45, 2999993.57|
| 0.00,-996.15,-1985797.84|
| 0.00, 0.00, 1.00|

enter image description here

Background:

I am creating files for a groundwater flow model and need to compute zonal statistics for each model grid cell using some .csv data from the Puerto Rico Agricultural Water Management web portal. These data are available on a daily timestep for numerous parameters (e.g. ET, tmax, tmin, precip, etc.). These files are not georeferenced, but ancillary files are available that specify the lon/lat for each cell which can then be projected using pyproj:

import pandas as pd
import numpy as np
import pyproj

url_base = 'http://academic.uprm.edu/hdc/GOES-PRWEB_RESULTS'

# Load some data
f = '/'.join([url_base, 'actual_ET', 'actual_ET20090101.csv'])
array = pd.read_csv(f, header=None).values

# Read longitude values
f = '/'.join([url_base, 'NON_TRANSIENT_PARAMETERS', 'LONGITUDE.csv'])
lon = pd.read_csv(f, header=None).values

# Read latitude values
f = '/'.join([url_base, 'NON_TRANSIENT_PARAMETERS', 'LATITUDE.csv'])
lat = np.flipud(pd.read_csv(f, header=None).values)

# Project to x/y coordinates (North America Albers Equal Area Conic)
aea = pyproj.Proj('+init=ESRI:102008')
x, y = aea(lon, lat)

Before I can compute zonal statistics, I need to create the affine transform that relates row/column coordinates to projected x/y coordinates. I specify the 6 parameters to create the Affine object using the affine library:

import math
from affine import Affine

length_of_degree_longitude_at_lat_mean = 105754.71  # 18.25 degrees via http://www.csgnetwork.com/degreelenllavcalc.html
length_of_degree_latitude_at_lat_mean = 110683.25   # 18.25 degrees via http://www.csgnetwork.com/degreelenllavcalc.html

# Find the upper left x, y
xul, yul = aea(lon[0][0], lat[0][0])
xll, yll = aea(lon[-1][0], lat[-1][0])
xur, yur = aea(lon[0][-1], lat[0][-1])
xlr, ylr = aea(lon[-1][-1], lat[-1][-1])

# Compute pixel width
a = abs(lon[0][1] - lon[0][0]) * length_of_degree_longitude_at_lat_mean

# Row rotation
adj = abs(xlr - xll)
opp = abs(ylr - yll)
b = math.atan(opp/adj)

# x-coordinate of the upper left corner
c = xul - a / 2

# Compute pixel height
e = -abs(lat[1][0] - lat[0][0]) * length_of_degree_latitude_at_lat_mean

# Column rotation
d = 0

# y-coordinate of the upper left corner
f = yul - e / 2

affine = Affine(a, b, c, d, e, f)

where:

  • a = width of a pixel
  • b = row rotation (typically zero)
  • c = x-coordinate of the upper-left corner of the upper-left pixel
  • d = column rotation (typically zero)
  • e = height of a pixel (typically negative)
  • f = y-coordinate of the of the upper-left corner of the upper-left pixel

(from https://www.perrygeo.com/python-affine-transforms.html)

The resulting affine matrix looks reasonable and I've tried passing row and column rotation as both radians and degrees with little change in the result. Link to grid features: grid_2km.geojson

import rasterstats
import matplotlib.pyplot as plt

grid_f = 'grid_2km.geojson'
gdf = gpd.read_file(grid_f)

zs = rasterstats.zonal_stats(gdf, 
                             array, 
                             affine=affine,
                             stats=['mean'])

df = pd.DataFrame(zs).fillna(value=np.nan)

fig = plt.figure(figsize=(14, 6))

ax = fig.add_subplot(131, aspect='equal')
ax.pcolormesh(x, y, np.zeros_like(array))
ax.pcolormesh(x, y, array)
ax.set_title('Projected Data')

ax = fig.add_subplot(132, aspect='equal')
gdf.plot(ax=ax)
ax.set_title('Projected Shapefile')

ax = fig.add_subplot(133, aspect='equal')
ax.imshow(df['mean'].values.reshape((gdf.row.max(), gdf.col.max())))
ax.set_title('Zonal Statistics Output')

plt.tight_layout()
plt.show()

enter image description here

Further, there is a discrepancy between x, y value pairs transformed using the affine object versus those derived from the native lon, lat values using pyproj:

rr = np.array([np.ones(x.shape[1], dtype=np.int) * i for i in range(x.shape[0])])
cc = np.array([np.arange(x.shape[1]) for i in range(x.shape[0])])
x1, y1 = affine * (cc, rr)

fig = plt.figure(figsize=(14, 6))

ax = fig.add_subplot(111, aspect='equal')
ax.scatter(x, y, s=.2)
ax.scatter(x1, y1, s=.2)

plt.show()

enter image description here

Jason Bellino
  • 494
  • 1
  • 6
  • 18
  • If you have projected data and a projected shapefile, what makes you think its an affine issue? Are you using [rasterstats](https://pythonhosted.org/rasterstats/)? – dubbbdan Mar 26 '19 at 13:19
  • @dubbbdan Yes, I'm using the rasterstats package; I've updated the question so that it now shows the zonal statistics computations. x, y values can be returned from the affine by passing row, column pairs and these x, y values do not match those computed directly from the native lon, lat pairs using pyproj which suggests to me that it's a problem with the affine. See the additional figure I added to the post to illustrate. – Jason Bellino Mar 26 '19 at 17:13
  • rasterstats is a great package! The fact that `zonal_stats` returns data leads me to believe that everything is fine with you affine. Have you looked at passing the arguement `raster_out=True` in `zonal_stats`? – dubbbdan Mar 26 '19 at 17:26
  • I think the problem is that you are not plotting the zonal statistics properly. Reshaping `df['mean]' isnt enought to assoicate a projected grid with the resulting zonal statistic calculations. Can you provide a link to the shapefile (ideally as geojson)? – dubbbdan Mar 26 '19 at 17:35
  • @dubbbdan I added a link to the features which are hosted in a GitHub gist (https://gist.github.com/jbellino-usgs/6be521efec919c1ef5684e1452a069f9). The object returned by `zonal_stats` is a list of dicts with one dict for each zone feature in the grid shapefile. In this case each dictionary consists of one key and one value since I'm only asking for the mean. If I create a DataFrame from that list of dicts and then reshape the resulting column of data I end up with a numpy array with the same shape as the input grid features. – Jason Bellino Mar 26 '19 at 18:26
  • As an aside, I found some code within rasterstats that seems to hard-wire a presumption of orthogonality with respect to the Cartesian coordinate axes when reading raster data (https://github.com/perrygeo/python-rasterstats/issues/182). However, I still think I'm not correctly specifying my affine given the discrepancy between the two plots shown above. – Jason Bellino Mar 26 '19 at 18:31
  • lets chat https://chat.stackoverflow.com/rooms/190720/zonal-stats – dubbbdan Mar 26 '19 at 19:12

0 Answers0