0

I have a dataset with info on international investments in Europe and coordinates about NUTS3. For each investment I have the city and the coordinates (lat1,long1). I want to compute the distance from each city to each of the NUTS 3 I have --> E.G. Paris to Paris, Paris_Lyone, Paris_Orly, Paris_Maidenhead etc etc. I want to loop this mechanism for all the cities I have, so at the end I have a matrix for each city that include its distance to each NUTS. I tried to use geosphere but it gives me just the distance between rows.

summary(coordinate$NUTS_BN_ID)
summary(fdimkt$NUTS_BN_ID)


##merge dataset
df <- merge(fdimkt,coordinate, by="nutscode", all = FALSE)
View(df)
fix(df)

#install.packages("dplyr")
library(dplyr)

df %>% dplyr::rename(lat1= `_destination_latitude`, long1= `_destination_longitude` )


library(geosphere)
library(data.table)
#dt <- expand.grid.df(df,df)

setDT(df)[ , dist_km := distGeo(matrix(c(`_destination_latitude`, `_destination_longitude`), ncol = 2), 
                                matrix(c(`lat2`, `long2`), ncol = 2))/1000]
summary(df$dist_km)

This didn't work because it returns me the distance by row, but I actually want the distance from each city to all the NUTS3 coordinates I have

Someone can help me with this?

I'm not sure on how to post my dt, this I gues that might help to have more suggestions.

Dalila
  • 181
  • 2
  • 2
  • 8
  • You want to use `distm(df)` from the Geosphere package. See my answer here for an example: https://stackoverflow.com/questions/58831578/minimum-distance-between-lat-long-across-multiple-data-frames/58841322#58841322. Also note: the geosphere package expects the longitude to be the first column and latitude as the second column. – Dave2e Nov 22 '19 at 21:22
  • Thanks @Dave2e I am checking that out. Just to be clear, I prep'd the dataset to be long , lat as order. Is it necessary to do the data.frame passage? Since I have 30.000 rows and I'm new to R in these stuff and I don't know if I have to write every i in c(12.5667,45.6789, etc etc). – Dalila Nov 25 '19 at 16:54
  • Calculating all of distances between 30,000 points will be time consuming and memory intensive, you may want to break your problem up into smaller parts. Your data frame should have two columns, the longitude column needs to come prior to the latitude column. If you have questions how to get your data in the correct format, search here or ask a new question, posting a sample of your data.frame. – Dave2e Nov 25 '19 at 18:05

0 Answers0