Finding shortest mean distances per index from a distance matrix in R

Question

I'm helping to put together a spatial R lab for a third year class, and one of the tasks will be to identify a specific site that is located the closest (i.e. mean shortest distance) to a set of multiple other sites.

I have a distance matrix dist_m that I produced by using the gdistance::costDistance which looks something like this:

# Sample data
m <- matrix(c(2, 1, 8, 5,
              7, 6, 3, 4,
              9, 3, 2, 8,
              1, 3, 7, 4),
            nrow  = 4,
            ncol  = 4,
            byrow = TRUE)

# Sample distance matrix
dist_m <- dist(m)

dist_m when printed looks like:

          1         2         3
2  8.717798
3  9.899495  5.477226
4  2.645751  7.810250 10.246951

Desired output: From this dist I want to be able to identify the index value (1, 2, 3 or 4) that has the lowest average distance. In this example, it would be index 4, which has an average distance of 6.90. Ideally, I'd also like the mean distance returned too (6.90).

I can find the mean distance of an individual index by doing something like this:

# Convert distance matrix to matrix
m = as.matrix(dist_m)

# Set diagonals and upper triangle to NA
m[upper.tri(m)] = NA
m[m == 0] = NA

# Calculate mean for index
mean(c(m[4,], m[,4]), na.rm = TRUE)

However, I ideally want a solution that either identifies the index with the minimum mean distance directly, rather than having to plug in index values manually (the actual dataset will be much larger than this).

As this is for a university class, I'd like to keep any solution as simple as possible: for-loops and apply functions are likely to be difficult to grasp for students with little experience in R.

minem · Answer 1 · 2018-01-09T13:58:11.567

try this:

rMeans <- rowMeans(m, na.rm = T)
names(rMeans) <- NULL
which(rMeans == min(rMeans, na.rm = T))
# [1] 4

Or as a function:

minMeanDist <- function(x) {
  m <- as.matrix(x)
  m[upper.tri(m)] <- NA
  m[m == 0] <- NA
  rMeans <- rowMeans(m, na.rm = T)
  names(rMeans) <- NULL
  mmd <- min(rMeans, na.rm = T)
  ind <- which(rMeans == mmd)
  list(index = ind, min_mean_dist = mmd)
}
minMeanDist(dist_m)
# $index
# [1] 4
# 
# $min_mean_dist
# [1] 6.900984

score 1 · Accepted Answer · answered Jan 09 '18 at 14:56

If you want to use the tidyverse this is one way:

as.matrix(dist_m) %>%
    as.tibble() %>%
    rownames_to_column(var = "start_node") %>%
    gather(end_node, dist, -start_node) %>% # go long
    filter(dist != 0) %>% # drop identity diagonal
    group_by(start_node) %>% # now summarise
    summarise(mean_dist = mean(dist)) %>%
    filter(mean_dist == min(mean_dist)) # chose minimum mean_dist

# A tibble: 1 x 2
  start_node mean_dist
       <chr>     <dbl>
1          4  6.900984

It's a little long but the pipes make it easy to see what is happening at each line and you get a nice output.

This is fantastic; my attempts at a three column solution had failed earlier because I was removing the upper triangle out of habit prior to melting/gathering; melting/gathering the entire matrix works a treat, and `tidyverse` is ideal for this lab! — Robbi Bishop-Taylor, Jan 09 '18 at 23:43

Finding shortest mean distances per index from a distance matrix in R

2 Answers2