0

I want to create a point in polygon query for 14million NYC taxi trips and find out which of the 263 taxi zones the trips were located.

I want to the code on RAPIDS cuspatial. I read a few forums and posts, and came across cuspatial polygon limitations that users can only perform queries on 32 polygons in each run. So I did the following to split my polygons in batches.

This is my taxi zone polygon file

cusptaxizone
(0        0
 1        1
 2       34
 3       35
 4       36
       ... 
 258    348
 259    349
 260    350
 261    351
 262    353
 Name: f_pos, Length: 263, dtype: int32,
 0          0
 1        232
 2       1113
 3       1121
 4       1137
        ...  
 349    97690
 350    97962
 351    98032
 352    98114
 353    98144
 Name: r_pos, Length: 354, dtype: int32,
                    x              y
 0      933100.918353  192536.085697
 1      932771.395560  191317.004138
 2      932693.871591  191245.031174
 3      932566.381345  191150.211914
 4      932326.317026  190934.311748
 ...              ...            ...
 98187  996215.756543  221620.885314
 98188  996078.332519  221372.066989
 98189  996698.728091  221027.461362
 98190  997355.264443  220664.404123
 98191  997493.322715  220912.386162
 
 [98192 rows x 2 columns])

There are 263 polygons/ taxi zones in total - I want to do queries in 24 batches and 11 polygons in each iteration.

def create_iterations(start, end, batches):
    iterations = list(np.arange(start, end, batches))
    iterations.append(end)
    return iterations


pip_iterations = create_iterations(0, 264, 24)


#loop to do point in polygon query in a table
def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):
    cuda_df['borough'] = " "
    for i in range(len(iter_batch)-1):
        start = pip_iterations[i]
        end = pip_iterations[i+1]
        pip = cuspatial.point_in_polygon(cuda_df['pickup_longitude'], cuda_df['pickup_latitude'],
                                         cuspatial_data[0][start:end],  #poly_offsets
                                         cuspatial_data[1],  #poly_ring_offsets
                                         cuspatial_data[2]['x'],  #poly_points_x
                                         cuspatial_data[2]['y']  #poly_points_y
                                        )

        for i in pip.columns:
            cuda_df['borough'].loc[pip[i]] = polygon_name[i]
    return cuda_df

When I ran the function I received a type error. I wonder what might cause the issue?

pip_pickup = perform_pip(cutaxi, cusptaxizone, pip_iterations)

TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'
byc
  • 121
  • 10
  • 1
    `perform_pip() missing 1 required positional argument: 'iter_batch'`. You should pass four arguments to your function if it requires four arguments. – Nick Becker Dec 28 '20 at 16:17
  • There's a complete walkthrough of this topic in this notebook: https://github.com/rapidsai/cuspatial/blob/branch-0.18/notebooks/nyc_taxi_years_correlation.ipynb – Trenton Nov 24 '21 at 23:30

1 Answers1

0

It seems like you are passing in cutaxi for cuda_df, cusptaxizone for cuspatial_data and pip_iterations for polygon_name variable in perform_pip function. There is no variable/value passed for iter_batch defined in perform_pip function:

def perform_pip(cuda_df, cuspatial_data, polygon_name, iter_batch):

Hence, you get the above error which states that iter_batch is missing. As stated in the above comment as well you are not passing the right number of parameters for perform_pip function. If you edit your code to pass in the right number of variables to perform_pip function the above mentioned error :

TypeError: perform_pip() missing 1 required positional argument: 'iter_batch'

would be resolved.

saloni
  • 296
  • 1
  • 7
  • ah! thanks for point it out @saloni! i input the the taxizone data again as geopandas data frame added `polygon_name` as `taxizonegpd['borough']`. The `perform_pip` function returns a cudf with an empty `borough` column. how come? – byc Dec 29 '20 at 10:40
  • I would check the output of the `for` loops to make sure they are behaving as expected. It seems like : `cuda_df['borough'].loc[pip[i]] = polygon_name[i]` might not be adding anything to the dataframe `cuda_df`. This could be causing the code to return an empty `borough` column – saloni Dec 30 '20 at 18:17