I need to intercept a 25 millions UTM points dataframe with a list of polygons. Point in Poligon problem. My code is taking a lot of time (2 hours). Can anyone please suggest any other way of accelerating this process? Thank you.
Original data is in x,y and i transform it to Lat/lon via utm function
side note: if I use: rawdata_df['Lat-Lon']=utm.to_latlon(rawdata_df['//X'], rawdata_df['Y'], 29, 'T'), it returns this error:
ValueError: Length of values (2) does not match length of index (24238735)
Here is my code:
for rawdata_df in df: #i am reading chunks of 25 million points from a larger file
print('lat')
rawdata_df['Lat']=utm.to_latlon(rawdata_df['ABZ_X'], rawdata_df['ABZ_Y'], 29, 'T')[0]
print('lon')
rawdata_df['Lon']=utm.to_latlon(rawdata_df['ABZ_X'], rawdata_df['ABZ_Y'], 29, 'T')[1]
print('convert to geodataframe')
gdf = geopandas.GeoDataFrame(rawdata_df, geometry=geopandas.points_from_xy(rawdata_df.Lon, rawdata_df.Lat))
gdf.set_crs(crs=None, epsg=3763, inplace=True, allow_override=False)
gdf_clip=geopandas.GeoDataFrame()
print('clip')
counter=1
for pol in poligon_lst:
print('clipping :',counter,'de',len(poligon_lst),sep=' ')
gdf_clip=gdf.clip(pol[1])
df = pd.DataFrame(gdf_clip, copy=True)
iff.write_df(df,out_dir+pol[0]+'_cloud.txt') #own function for writing dataframes
counter+=1