Summary/simplified version
Given a list of track points defined by three 1-dimensional arrays (lats
, lons
and dtime
all with same length) and a gridded 3-dimensional array rr
(defined by 2-D lat_radar
, lon_radar
coordinate arrays and a 1-dimensional time array time_radar
) I want to extract all the grid values in rr
where the coordinates (latitude, longitude AND time included) are closest to the three 1-dimensional arrays.
I've managed to use cKDTree
to select points in space but I don't know how to generalize the solution to space & time together. Right now I have to do the selection on time separately and it makes the code quite bulky and hard to read.
for more details about this problem see hereinafter
Extended version
I'm trying to develop an app that uses precipitation data obtained from weather radar composites to predict the precipitation along a track. Most apps usually predict the precipitation at a point without considering the point moving in time.
The idea is, given points identifying a track in space and time, find the closest grid points from radar data to obtain a precipitation estimate over the track (see plot). The final goal would be to shift the start time to identify the best time to leave to avoid rain.
I just optimized my previous algorithm, that was using plain loops, to use cKDTree
from scipy
. Execution time went down from 30s to 380ms :). However I think the code can still be optimized. Here is my attempt.
As input we have
lons
,lats
: coordinates of the track as N-dimensional arraysdtime
:timedelta
T-dimensional array containing the time elapsed on the tracklon_radar
,lat_radar
: M x P matrices containing the coordinates of the radar datadtime_radar
:timedelta
Q-dimensional array containing the radar forecastrr
: M x P X Q array containing the radar forecast at every time step
First find the grid points closest to the trajectory using cKDTree
:
combined_x_y_arrays = np.dstack([lon_radar.ravel(),
lat_radar.ravel()])[0]
points_list = list(np.vstack([lons, lats]).T)
def do_kdtree(combined_x_y_arrays, points):
mytree = cKDTree(combined_x_y_arrays)
dist, indexes = mytree.query(points)
return indexes
results = do_kdtree(combined_x_y_arrays, points_list)
# As we have many duplicates, since the itinerary has a much higher resolution than the radar,
# we only select the unique points
inds_itinerary = np.unique(results)
lon_lat_itinerary = combined_x_y_arrays[inds_itinerary]
then find the closest points in the track to subset it. It doesn't make sense to have a track resolution of 10 m if the radar only has grid points every km.
combined_x_y_arrays = np.vstack([lons, lats]).T
points_list = list(lon_lat_itinerary)
results = do_kdtree(combined_x_y_arrays, points_list)
Now we can use these positions to get the elapsed time on the trajectory and the relative time steps in radar data
dtime_itinerary = dtime[results]
# find indices of these dtimes in radar dtime
inds_dtime_radar = np.abs(np.subtract.outer(dtime_radar, dtime_itinerary)).argmin(0)
Now we have everything that we need to find the precipitation so we only need one last loop. I also loop on shifts
to obtain prediction with different start times.
shifts = (1, 3, 5, 7, 9)
rain = np.empty(shape=(len(shifts), len(inds_itinerary)))
for i, shift in enumerate(shifts):
temp = []
for i_time, i_space in zip(inds_dtime_radar, inds_itinerary):
temp.append(rr[i_time+shift].ravel()[i_space])
rain[i, :] = temp
In particular I would like to find a way to combine the time search with the lat-lon search for the closest points.