I am new to python, so please forgive my novicity. I have two datasets one with 440k rows (file A) and the other with 10k rows (file B). Each file has a pair of latitudes and longitudes. I am trying to find the haversine distance between each coordinate in file A to each coordinate in file me and then save it to an output file with rows lat1, long1, lat2, long2, distance. While I checked for the existing for loop questions, I didn't quite understand the solution for avoiding nested for loop. So I used the following code:
##### Opening new csv file and writing headers #####
with open("distance.csv","w+") as file:
csv_writer = writer(file)
row=['lat1', 'long1', 'lat2', 'long2', 'distance']
csv_writer.writerow(row)
#### iterate through each row and calculate the haversine distance ####
for i in range(len(df1)) :
for j in range(len(df2)):
distance = haversine(df1.loc[i, "long1"], df1.loc[i,"lat1"], df2.loc[j, "long2"], df2.loc[j,"lat2"])
with open("distance.csv","a+") as file:
csv_writer = writer(file)
row=[df1.loc[i, "long1"], df1.loc[i,"lat1"], df2.loc[j, "long2"], df2.loc[j,"lat2"], distance]
csv_writer.writerow(row)
This approach is very time-consuming. Is there a better approach?