6

I have a distance matrix:

> mat
          hydrogen   helium  lithium beryllium    boron
hydrogen  0.000000 2.065564 3.940308  2.647510 2.671674
helium    2.065564 0.000000 2.365661  1.697749 1.319400
lithium   3.940308 2.365661 0.000000  3.188148 2.411567
beryllium 2.647510 1.697749 3.188148  0.000000 2.499369
boron     2.671674 1.319400 2.411567  2.499369 0.000000

And a data frame:

> results

El1      El2    Score
Helium Hydrogen   92
Boron   Helium    61
Boron  Lithium    88

I want to calculate all the pairwise distances between the words in results$El1 and results$El2 to get the following:

> results

El1      El2    Score   Dist
Helium Hydrogen   92    2.065564
Boron   Helium    61    1.319400
Boron  Lithium    88    2.411567

I did this with a for loop but it seems really clunky. Is there a more elegant way to search and extract distances with fewer lines of code?

Here is my current code:

names = row.names(mat) 
num.results <- dim(results)[1]   
El1 =  match(results$El1, names)  
El2 = match(results$El2, names)    
el.dist <- matrix(0, num.results, 1)        
for (i1 in c(1:num.results)) {             
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
}
results$Dist = el.dist[,1] 
Dex
  • 63
  • 4

2 Answers2

4
cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
results
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567

You'll recognize most of the code. The one to focus on is mat[cbind(rows, cols)]. With matrices, we are allowed to subset by another matrix with the same number of columns as dimensions. From the ?`[` help:

When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.

Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • I deleted the previous comment after finding the problem: a rogue capital letter! – Dex Aug 18 '15 at 03:58
3

Another approach

results$Dist <- mapply(function(x, y) mat[tolower(x), tolower(y)],
                       results$El1, results$El2)

This assumes results use character not factor for El1 and El2.

The result

> results
     El1      El2 Score     Dist
1 Helium Hydrogen    92 2.065564
2  Boron   Helium    61 1.319400
3  Boron  Lithium    88 2.411567
Ricky
  • 4,616
  • 6
  • 42
  • 72
  • Thank you! I just tried this and it worked fine even though `El1` and `El2` are factors. Is it not advised to use `mapply` with factors? – Dex Aug 18 '15 at 02:31
  • 1
    @user20672 - the factor/character difference will change what results are returned when it is possible to index with an integer **or** a character. A factor is an integer internally... so `x <- c(b=1,a=2)` and then `x[factor(c("a","b"))]` and `x[c("a","b")]` will return different answers. – thelatemail Aug 18 '15 at 02:35