I have 2 DataFrames, the first is for real estate (A) and the second for business types (B). They both have longitude and latitude.
First I measured the distance between the coordinates to filter them by the closest ones (a radius of 2.5km).
Now, I want to add a new column in the real estate database that counts the frequency of the appearance of each type of business around.
For this I create a nested loop to separate the longitude and latitude of the property database and later apply the same in the business database.
The result is the distance in km between (A) and (B). Then filter the locations that were less than 2.5k
Now, I want to create a dictionary that contains all the information from database A, but with new columns that contain the count of the frequency of business type B close to A.
#DataFrame A
df2 = pd.read_csv(cvs_listing_departamentos_venta_jalisco)
#DataFrame B
df_init = df_convertor(cvs_path_file)
list_listing_dic = []
# Iterate over Real Estate DataFrame and separate latitude and longitude
for row in df2.values[:1]:
lat = row[4]
long = row[3]
# Iterate over DataFrame DENUE and separate latitude and longitude
for row2 in df_init.values[:100]:
lat2 = row2[-3]
long2= row2[-2]
# Measures distance from the property compared to Deneu and returns a distance in km
km_distancia = distance.distance((lat, long), (lat2, long2))
# Function to Validate if the distance is within an established radius of the property
dist_valida = validar_km(km_distancia)
# if the distance is validated it creates a new dictionary with the values
if dist_valida == True:
#this is the business I want to count
tipo_negocio = row2[4]
listin_dict ={
'precio':row[0],
'sku': row[13],
tipo_negocio : 1 #this is the row i cant fix
}
list_listing_dic.append(listin_dict)
print(list_listing_dic)
df = pd.DataFrame(list_listing_dic)
df = result.drop_duplicates(subset= 'sku', keep = 'first')
print(df.head())
print(df.info())
This is the output:
[{'precio': 5950000.0, 'sku': 'AP5F77C509A038AMX', 'Restaurantes con servicio de preparación de pizzas, hamburguesas, hot dogs y pollos rostizados para llevar': 1},
{'precio': 5950000.0, 'sku': 'AP5F77C509A038AMX', 'Restaurantes con servicio de preparación de alimentos a la carta o de comida corrida': 1},
{'precio': 5950000.0, 'sku': 'AP5F77C509A038AMX', 'Restaurantes con servicio de preparación de alimentos a la carta o de comida corrida': 1},
{'precio': 5950000.0, 'sku': 'AP5F77C509A038AMX', 'Restaurantes de autoservicio': 1},
{'precio': 5950000.0, 'sku': 'AP5F77C509A038AMX', 'Restaurantes con servicio de preparación de tacos y tortas': 1},
{'precio': 5950000.0, 'sku': 'AP5F77C509A038AMX', 'Restaurantes con servicio de preparación de pescados y mariscos': 1}]
I want this output:
[{'precio':5950000.0, 'sku' :'AP5F77C509A038AMX, 'Restaurantes con servicio de preparación de pizzas, hamburguesas, hot dogs y pollos rostizados para llevar': 1, 'Restaurantes con servicio de preparación de alimentos a la carta o de comida corrida': 2, 'Restaurantes de autoservicio': 1...}]