I have a Spark SQL DataDrame
with columns latitude and longitude, I am trying to filter rows that fall below a threshold by calculating the distance to an input. My current code looks like. I am using geopy
(great_circle
) for calculating the distance between lat long pairs.
from geopy.distance import great_circle
point = (10, 20)
threshold = 10
filtered_df = df.filter(great_circle(point, (df.lat, df.lon)) < threshold)
When I run this code I get the following error
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
I am confused on which part of the filter expression is wrong.