I want to create a point in polygon query for 14million NYC taxi trips and find out which of the 263 taxi zones the trips were located.
I want to the code on RAPIDS cuspatial. I read a few forums and posts, and came across cuspatial polygon limitations that users can only perform queries on 32 polygons in each run. So I did the following to split my polygons in batches.
This is my taxi zone polygon file
cusptaxizone
(0 0
1 1
2 34
3 35
4 36
...
258 348
259 349
260 350
261 351
262 353
Name: f_pos, Length: 263, dtype: int32,
0 0
1 232
2 1113
3 1121
4 1137
...
349 97690
350 97962
351 98032
352 98114
353 98144
Name: r_pos, Length: 354, dtype: int32,
x y
0 933100.918353 192536.085697
1 932771.395560 191317.004138
2 932693.871591 191245.031174
3 932566.381345 191150.211914
4 932326.317026 190934.311748
... ... ...
98187 996215.756543 221620.885314
98188 996078.332519 221372.066989
98189 996698.728091 221027.461362
98190 997355.264443 220664.404123
98191 997493.322715 220912.386162
[98192 rows x 2 columns])
There are 263 polygons/ taxi zones in total - I want to do queries in 24 batches and 11 polygons in each iteration.
def create_iterations(start, end, batches):
iterations = list(np.arange(start, end, batches))
iterations.append(end)
return iterations
pip_iterations = create_iterations(0, 264, 24)
#loop to do point in polygon query in a table
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
cuda_df['borough'] = " "
for i in range(len(iter_batch)-1):
start = pip_iterations[i]
end = pip_iterations[i+1]
pip = cuspatial.point_in_polygon(cuda_df['pickup_longitude'], cuda_df['pickup_latitude'],
cuspatial_data[0][start:end], #poly_offsets
cuspatial_data[1], #poly_ring_offsets
cuspatial_data[2]['x'], #poly_points_x
cuspatial_data[2]['y'] #poly_points_y
)
for i in pip.columns:
cuda_df['borough'].loc[pip[i]] = polygon_name[i]
return cuda_df
When I ran the function I received a type error. I wonder what might cause the issue?
pip_pickup = perform_pip(cutaxi, cusptaxizone, pip_iterations)
TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'