0

I have one dataframe which contains names of stations and its coordinates. I want for every station the nearest station based on its coordinates.

What i have is 2 functions:

import math
def dist2(lat1, long1, lat2, long2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lat1, long1, lat2, long2 = map(lambda x: x*pi /180.0, [lat1, long1, lat2, long2])
    # haversine formula 
    dlon = long2 - long1 
    dlat = lat2 - lat1 
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.asin(math.sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371* c
    return km
def find_nearest2(lat, lng):
   
    min_dist =100000
    min_index = None
    distances = df_onlystations_clean.apply(
    lambda row: dist2(lat, lng, row['lat'], row['lng']), 
    axis=1)
    if d < min_dist:
      min_dist = d
        
    return df_onlystations_clean.loc[min_dist.idxmin(), 'name']
df_onlystations_clean.apply(
    lambda row: find_nearest2(row['lat'], row['lng']), 
    axis=1)

I always get the same error: 'DataFrame' object has no attribute 'apply' How can I loop through the df. What I'm doing wrong??

susi0512
  • 1
  • 1

1 Answers1

0

It's because a Spark dataframe is not a Pandas dataframe, you cannot loop over the rows or call the apply method.

You'd have to use Spark udf API to pass a Python user define function on the data.

Yoan B. M.Sc
  • 1,485
  • 5
  • 18
  • ok i added this to my code : ``` rdd=df_onlystations_clean.rdd rdd2 = df_onlystations_clean.rdd.map(lambda row: (find_nearest2(row["lat"],row["lng"]))) ``` But I got this error: Could not serialize object: TypeError: can't pickle _thread.RLock objects – susi0512 Feb 08 '22 at 09:40
  • @susi0512, you don't need to convert to rdd for that, the dataframe API already have what you need. You'll find the detail on this tutorial:https://sparkbyexamples.com/pyspark/pyspark-udf-user-defined-function/ – Yoan B. M.Sc Feb 08 '22 at 13:42