2

I have the following data frame structured in terms of 3 variables, i.e Location, Latitude, and Longitude within every single group. I would like to calculate the euclidean distance between all unique location combinations within each group. So for instance, based on the data frame below: the euclidean distance between - (A - London and A - Zurich) and (A - Zurich and A - New York) and (A - New York and A - London). And on a similar note (B - New York and B - London).

Then the average of all these unique distance pairs then needs to be calculated.

euc_dist <- function(x1, x2){
 return(sqrt(sum((x1 - x2)^2)))
}
id  Group  Location Latitude  Longitude

1    A     London    1         2
2    A     New York  3         4
3    A     Zurich    5         6
4    B     New York  7         8
5    B     New York  9         10
6    B     London    11        12

The output should look like:

id  Group  Average Euclidean distance  

1    A      xx       
2    B      xx       

Thank you in advance!

9834
  • 21
  • 2

1 Answers1

0

Here's a dplyr solution:

library(dplyr)
data.frame(Group=gl(2, 3, labels = c("A", "B")),
           Latitude=seq(1, 11, 2),
           Longitude=seq(2, 12, 2)) %>%
  group_by(Group) %>%
  summarise(mean_dist=mean(dist(cbind(Latitude, Longitude))))

(R's dist function defaults to calculating Euclidean distance and does it very, very efficiently)

# A tibble: 2 x 2
  Group mean_dist
  <fct>     <dbl>
1 A          3.77
2 B          3.77

I'm not totally clear on what the "unique locations" means because each location should only have a single latitude and longitude, correct?

Dubukay
  • 1,764
  • 1
  • 8
  • 13