1

I have VehicleID, eventdatetime, latitude, longitude, vehicle speed columns and over a million rows. I need to fetch city, state, district for the same.

here is the working python code snippet for the same using geopy and reverse-geooder library. I am not able to do the same in pyspark.

if __name__ == "__main__":
    # Coordinates tuple.Can contain more than one pair.
    coordinates = list(zip(pdVehicles['gpsLatitude'],pdVehicles['gpslongitude']))#generates pairs of latlong
    data = reverseGeocode(coordinates)
    pdVehicles['City_Town'] = [i['name']for i in data]
    pdVehicles['State'] = [i['admin1'] for i in data]
    pdVehicles['District'] = [i['admin2'] for i in data]

I tried using UDF but it didn't work.

samkart
  • 6,007
  • 2
  • 14
  • 29

0 Answers0