1

Consider we have two different datasets:

X1 = c(1,2,4,5,1,3,1)
Y1 = c(3,5,6,3,1,5,1)
df1= data.frame(X1,Y1)

X2 = c(2,3,4,3,2,3,2)
Y2 = c(3,4,2,6,4,3,4)
df2= data.frame(X2,Y2)

These data are represented in this scatterplot: enter image description here

I would like to calculate the distances between the 7 XY coordinates in df1 (black open dots) and the 7 XY coordinates in df2 (red open triangles).

I know how to calculate the distances between the XY coordinates within a dataset using dist() and cbind(). But I don't know how to do the same but with XY coordinates in two different datasets.

Using two datasets, we would obtain a table composed by 7 columns and 7 rows, filled by the distances among all these coordinates. Column names would be the coordinates in df1 and row names would be coordinates in df2.

How can I get this data frame with all t

antecessor
  • 2,688
  • 6
  • 29
  • 61
  • two for loop or lapply inside lapply to call the `dist()` and `rbind.data.frame()` or `data.table::rbindlist()` – abhiieor Feb 16 '18 at 10:02
  • Do you use a particular distance formula i.e. distance as a function of X1, Y1, X2 and Y2? – nghauran Feb 16 '18 at 10:03
  • I used dist(cbind(cbind(df1$X1, df1$Y1), cbind(df2$X2, df2$Y2))), but the results are not the expected. I get a matrix, but it does not gives what I want. Furthermore, if the XY coordinates are different among datasets, it returns an ERROR (which is obvious). @abhiieor could you please develop your proposal in my example? – antecessor Feb 16 '18 at 10:07
  • names(df1) <- c('X', 'Y') names(df2) <- c('X', 'Y') cnt <- 1 foo <- list() for (i in 1:nrow(df1)){ for(j in 1:nrow(df2)) { foo[[cnt]] <- cbind(i, j, dist(rbind(df1[i,], df2[j,]))) cnt <- cnt+1 } } – abhiieor Feb 16 '18 at 10:19

1 Answers1

2

Maybe this strategy may help

X1 = c(1,2,4,5,1,3,1)
Y1 = c(3,5,6,3,1,5,1)
df1= data.frame(X1,Y1) 

X2 = c(2,3,4,3,2,3,2)
Y2 = c(3,4,2,6,4,3,4)
df2= data.frame(X2,Y2)

library(tidyverse)

df1 = df1 %>% mutate(df_type = "data1") %>% select(X = X1, Y = Y1)

df2 = df2 %>% mutate(df_type = "data2")  %>% select(X = X2, Y = Y2)

# link data frames by row
df = bind_rows(df1, df2)

dist(cbind(df$X,df$Y))

   1        2        3        4        5        6        7        8        9       10       11       12       13
2  2.236068                                                                                                            
3  4.242641 2.236068                                                                                                   
4  4.000000 3.605551 3.162278                                                                                          
5  2.000000 4.123106 5.830952 4.472136                                                                                 
6  2.828427 1.000000 1.414214 2.828427 4.472136                                                                        
7  2.000000 4.123106 5.830952 4.472136 0.000000 4.472136                                                               
8  1.000000 2.000000 3.605551 3.000000 2.236068 2.236068 2.236068                                                      
9  2.236068 1.414214 2.236068 2.236068 3.605551 1.000000 3.605551 1.414214                                             
10 3.162278 3.605551 4.000000 1.414214 3.162278 3.162278 3.162278 2.236068 2.236068                                    
11 3.605551 1.414214 1.000000 3.605551 5.385165 1.000000 5.385165 3.162278 2.000000 4.123106                           
12 1.414214 1.000000 2.828427 3.162278 3.162278 1.414214 3.162278 1.000000 1.000000 2.828427 2.236068                  
13 2.000000 2.236068 3.162278 2.000000 2.828427 2.000000 2.828427 1.000000 1.000000 1.414214 3.000000 1.414214         
14 1.414214 1.000000 2.828427 3.162278 3.162278 1.414214 3.162278 1.000000 1.000000 2.828427 2.236068 0.000000 1.414214

Then you can create a data.frame with the distances between X and Y. First we need to transform the dist object into a data frame

df_dist = data.frame(as.matrix(dist(cbind(df$X,df$Y))))

Doing a bit of manipulation it is possible to have the distance between X and Y

df_dist_x = df_dist %>% select(X1:X7) %>%
  mutate(row.1 = 1:nrow(df_dist)) %>% 
  filter(row.1 >= 8) %>%
  mutate(Y = paste0("Y",row_number())) %>%
  gather(X, distance, X1:X7) %>% 
  select(X, Y, distance)

head(df_dist_x)
   X  Y distance
1 X1 Y1 1.000000
2 X1 Y2 2.236068
3 X1 Y3 3.162278
4 X1 Y4 3.605551
5 X1 Y5 1.414214
6 X1 Y6 2.000000
Edu
  • 903
  • 6
  • 17
  • It seems ok, but in this case, your are measuring not only the distances between datasets, but also within coordinates in each dataset, right? How can I get 7 matrix? I focus in 7 matrix because I need to use them separately. – antecessor Feb 16 '18 at 10:20
  • Yes, I calculated the distances between the X and Y of both datasets. I'm not sure I am understanding right what you mean by "7 matrix", but I would transform this matrix into a data frame and try to filter information that is not repeated. – Edu Feb 16 '18 at 10:25
  • Now I understand 7 matrix is not what I would obtain, but 7 columns where each one represents the distances between the coordinates in df1 and the coordinates in df2. For example, the first column would be composed by 7 rows, each representing the distance between the first coordinate in df1 respect the 7 coordinates in df2. The second column using the second coordinate in df1 respect the other 7 coordinates in df2. And so on. I am updating my question. Any idea on how to get this new data frame? – antecessor Feb 16 '18 at 10:31