1

I have this "cost matrix" in R that represents the "cost" of going from any location to any other location (for a total of 5 locations):

X<-matrix(rnorm(25) , nrow = 5)

rownames(X) <- colnames(X) <- c("Location 1", "Location 2", "Location 3", "Location 4", "Location 5")

               Location 1  Location 2  Location 3 Location 4 Location 5
Location 1  0.4501251  2.30029903 -0.26950735  0.1723589  0.5045694
Location 2  1.1208198  1.38557818  0.25250596 -0.6174514 -0.5324785
Location 3  0.4181011  0.01103208  0.83713132 -0.7649082 -0.5619196
Location 4  0.9372365 -1.04258420  0.08397031  0.1611555  1.8356483
Location 5  1.0201278 -0.56020913  1.14815210  1.0362332 -2.2052776

I would like to find out the "Greedy Shortest Path" that starts from "Location 1" and ends at "Location 1" while visiting each location exactly once.

I think this would look something like this (R getting the minimum value for each row in a matrix, and returning the row and column name) - this code returns the smallest value in each row of the matrix:

result <- t(sapply(seq(nrow(X)), function(i) {
  j <- which.min(X[i,])
  c(paste(rownames(X)[i], colnames(X)[j], sep='/'), X[i,j])
}))

When I look at the results:

print(result)


     [,1]                    [,2]                
[1,] "Location 1/Location 3" "-0.269507349140081"
[2,] "Location 2/Location 4" "-0.617451368699149"
[3,] "Location 3/Location 4" "-0.764908186347014"
[4,] "Location 4/Location 2" "-1.04258420123991" 
[5,] "Location 5/Location 5" "-2.20527763537575" 

I think this is telling me that the "Greedy Shortest Path" (starting from "Location 1") is : 1 to 3, 3 to 4, 4 to 2, 2 to 4 ... but then I get stuck in a "2 to 4, 4 to 2" loop for ever.

  • Can someone please show me how I can find the "Greedy Shortest Path" that starts from "Location 1"?

By doing this manually:

  • Starting at Location 1, the "shortest greedy path" is to Location 4
  • From Location 4, the "shortest greedy path" is to Location 3
  • From Location 3, the "shortest greedy path" is to Location 5
  • From Location 5, the "shortest greedy path" is to Location 2 (since we have already been to Location 3 and Location 4, and we can not re-visit the current Location i.e. Location 5, and can not visit Location 1 since there is still a Location we haven't visited)
  • From Location 2, we now have no choice but to return to Location 1 and finish the journey

I would look to produce the following output:

Path : (1,4), (4,3), (3,5), (5,2), (2,1)
Total Distance = -0.8441315 + (-0.7244259) + (-0.3775706) + 0.3796208 + 0.3015059 =  -1.265001
  • Could someone please show me how to modify my code to get this final output?

Thank you!

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
stats_noob
  • 5,401
  • 4
  • 27
  • 83

2 Answers2

2

This keeps track of visited locations and doesn't check them:

set.seed(123)

n <- 5L
X <- matrix(rnorm(n^2), nrow = n)

rownames(X) <- colnames(X) <- paste("Location", 1:n)

shortest_path <- function(x, start = 1L) {
  n <- nrow(x)
  nn <- 1:n
  used <- c(start, integer(n - 1L))

  for (step in 2:n) {
    used[step] <- nn[-used][which.min(x[used[step - 1L], -used])]
  }
  
  data.frame(path = colnames(x)[used], dist = c(0, x[used[1:(n - 1L)] + n*(used[2:n] - 1L)]))
}

df <- shortest_path(X)
X
#>             Location 1 Location 2 Location 3 Location 4 Location 5
#> Location 1 -0.56047565  1.7150650  1.2240818  1.7869131 -1.0678237
#> Location 2 -0.23017749  0.4609162  0.3598138  0.4978505 -0.2179749
#> Location 3  1.55870831 -1.2650612  0.4007715 -1.9666172 -1.0260044
#> Location 4  0.07050839 -0.6868529  0.1106827  0.7013559 -0.7288912
#> Location 5  0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393
df
#>         path       dist
#> 1 Location 1  0.0000000
#> 2 Location 5 -1.0678237
#> 3 Location 3 -0.5558411
#> 4 Location 4 -1.9666172
#> 5 Location 2 -0.6868529
jblood94
  • 10,340
  • 1
  • 10
  • 15
  • Thank you! I tried your answer on my real data and got the following error: Error in path[1] <- colnames(x)[start] : replacement has length zero . Do you know why this error might be produced? thank you so much! – stats_noob Mar 30 '22 at 21:04
  • My real data has all diagonal elements as 0 , could this be causing the problem? – stats_noob Mar 30 '22 at 21:05
  • I think I just figured out how to fix this problem! I replace all 0 entries (i.e. diagonal elements) with a large number , i..e X[X == 0 ] <- 999999 .... I think now the code runs just fine! – stats_noob Mar 30 '22 at 21:08
  • I updated with faster/simpler code. Give it a try--it doesn't use the diagonals. – jblood94 Mar 30 '22 at 21:25
  • Nice dynamic programming, upvoted! – ThomasIsCoding Mar 30 '22 at 23:10
1

This seems to be a typical Traveling Salesman Problem (TSP), and I believe you can find a bunch of implementation methods.


Here is a base R option by defining a recursive function like below (borrow data from @jblood94's answer)

f <- function(i, S = setdiff(1:ncol(X), i), path = i) {
    if (length(S) == 1) {
        return(list(cost = X[i, S] + X[S, 1], path = c(path, S)))
    }
    vp <- Inf
    for (k in S) {
        r <- Recall(k, setdiff(S, k), c(path, k))
        v <- X[i, k] + r$cost
        if (v <= vp) {
            vp <- v
            l <- list(cost = v, path = r$path)
        }
    }
    l
}

which gives

> f(1)
$cost
[1] -4.507312

$path
[1] 1 5 3 4 2

where

  • f(1) means that the start/end point is 1
  • $cost is the min sum cost
  • $path is the column indices that describe the Hamilton path
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thank you so much Tom! I like both of your answers! – stats_noob Mar 31 '22 at 06:21
  • 1
    The Travelling Salesman Problem is about attempting to find the "lowest cost" Hamiltonian Path that covers each vertex in the graph - correct? – stats_noob Mar 31 '22 at 06:22
  • Do you play chess at all? I was reading that the "Knights Tour Problem" (https://en.wikipedia.org/wiki/Knight%27s_tour) is about trying to find a Hamiltonian Path that covers all squares exactly once, and ending back at the starting point. Apparently on a 8 x 8 chessboard, there are many such Hamiltonian Paths. Do you happen to know if some of these Hamiltonian Paths will require fewer moves than others? Thank you so much! – stats_noob Mar 31 '22 at 06:23
  • @antonoyaro8 Yes, your understanding about TSP is correct, which aims to find a Hamiltonian path that covers each vertex in the graph but with lowest total cost. Sorry that I don't play chess. – ThomasIsCoding Mar 31 '22 at 12:00
  • @antonoyaro8 Regarding the "Knights Tour Problem", if you are looking for the Hamiltonian path, isn't the number of moves determined by the number of vertices (you need to traverse all vertices only once, so you will have a fixed number of moves in the end)? – ThomasIsCoding Apr 01 '22 at 08:30