7

I'm sure this has been answered before, but I can't find the thread for the life of me!

I am trying to use r to produce a list of all the distances between pairs of xy coordinates in a dataframe. The data is stored something like this:

ID = c('1','2','3','4','5','6','7')
x = c(1,2,4,5,1,3,1)
y = c(3,5,6,3,1,5,1)
df= data.frame(ID,x,y)

At the moment I can calculate the distance between two points using:

length = sqrt((x1 - x2)^2+(y1 - y2)^2).

However, I am uncertain as to where to go next. Should I use something from plyr or a for loop?

Thanks for any help!

unknown
  • 853
  • 1
  • 10
  • 23
  • For which pairs exactly do you want to calculate the distance? Is there an x1, x2, y1 and y2 coordinate for each ID? – PaulH Dec 06 '16 at 16:03
  • Hi Paul, no each ID is a point and I would like to calculate the distance between each point and every other point – unknown Dec 06 '16 at 16:10
  • 1
    if you need output as a list of {x, y} combinations, you can start with `expand.grid`, if you prefer a matrix you should start with `dist` – agenis Dec 06 '16 at 16:12

2 Answers2

16

Have you tried ?dist, the formula you listed is euclidean distance

dist(df[,-1]) 
DDrake
  • 318
  • 1
  • 9
9

You can use a self-join to get all combinations then apply your distance formula. All of this is easily do-able using the tidyverse (combination of packages from Hadley Wickham):

# Load the tidyverse
library(tidyverse)

# Set up a fake key to join on (just a constant)
df <- df %>% mutate(k = 1) 

# Perform the join, remove the key, then create the distance
df %>% 
 full_join(df, by = "k") %>% 
 mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
 select(-k)

N.B. using this method, you'll also calculate the distance between each point and itself (as well as with all other points). It's easy to filter those points out though:

df %>% 
 full_join(df, by = "k") %>% 
 filter(ID.x != ID.y) %>%
 mutate(dist = sqrt((x.x - x.y)^2 + (y.x - y.y)^2)) %>%
 select(-k)

For more information about using the tidyverse set of packages I'd recommend R for Data Science or the tidyverse website.

Jim Leach
  • 449
  • 5
  • 7
  • 3
    For anyone in the future looking at this. If you need to delete the duplicates (e.g. 1 to 2 and 2 to 1 will give the same distance), then here is a piece of (very ugly, but functional) code that you can use: 'IDx = df$ID.x IDy = df$ID.y length = df$length df <- data.frame(IDx,IDy,length) df <- data.frame(t(apply(df, 1, sort))) df <- unique(df) IDx = df$X2 IDy = df$X3 length = as.numeric(paste(df$X1)) df1 <- data.frame(IDx,IDy,length)' – unknown Dec 08 '16 at 11:39
  • @jim Leach I need the exact algorithm for my c++ implementation . I want to find the distance between each points and filter out duplicate points – sameer karjatkar Nov 07 '21 at 08:03
  • @sameerkarjatkar not sure I can help there, sorry (I don't know C++!). But that distance calculation is the Euclidean distance so looking for that or some kind of distance matrix calculation in C++ is probably a safe bet. – Jim Leach Nov 08 '21 at 14:37