I could use some help.
The main problem is to calculate the distance between two points with their latitude and longitude. We have divided Brazil into 33k hexagons, listed in the dataframe below:
I've been trying to merge this dataframe with its copy so i would have a 1 billion row dataframe with all combinations of those hexagons and calculate the distances between them with this function:
def get_distance(lat_1, lng_1, lat_2, lng_2):
d_lat = lat_2 - lat_1
d_lng = lng_2 - lng_1
temp = (
np.sin(d_lat / 2) ** 2
+ np.cos(lat_1)
* np.cos(lat_2)
* np.sin(d_lng / 2) ** 2
)
#print('a')
return 6373.0 * (2 * np.arctan2(np.sqrt(temp), np.sqrt(1 - temp))) * 1.4 * 1000
I tried merging them with Pandas and got a memory error (needed 8GB), so i've used Vaex library to convert the data to hdf5 files. However, when i try to merge those with this code, i get the same error.
with h5py.File('mergedfs', 'w') as hdf:
hdf.create_dataset('datasetmerge', data = dvv.join(dvv2, left_on='key', right_on='key2', how='left', allow_duplication=True))
Does anyone have been through something like this before? I aprecciate the help in advance.
Also, if you have any alternative solutions, i'd be glad to hear!