I have VehicleID
, eventdatetime
, latitude
, longitude
, vehicle speed
columns and over a million rows.
I need to fetch city
, state
, district
for the same.
here is the working python code snippet for the same using geopy and reverse-geooder library. I am not able to do the same in pyspark.
if __name__ == "__main__":
# Coordinates tuple.Can contain more than one pair.
coordinates = list(zip(pdVehicles['gpsLatitude'],pdVehicles['gpslongitude']))#generates pairs of latlong
data = reverseGeocode(coordinates)
pdVehicles['City_Town'] = [i['name']for i in data]
pdVehicles['State'] = [i['admin1'] for i in data]
pdVehicles['District'] = [i['admin2'] for i in data]
I tried using UDF but it didn't work.