3

I want to code travelling salesman problem in R. I am going to begin with 3 cities at first then I will expand to more cities. distance matrix below gives distance between 3 cities. Objective (if someone doesn't know) is that a salesman will start from a city and will visit 2 other cities such that he has to travel minimum distance.

In below case he should start either from ny or LA and then travel to chicago and then to the remaining city. I need help to define A_ (my constraint matrix).

My decision variables will of same dimension as distances matrix. It will be a 1,0 matrix where 1 represents travel from city equal to row name to a city equal to column name. For instance if a salesman travels from ny to chicago, 2nd element in row 1 will be 1. My column and row names are ny,chicago and LA

By looking at the solution of the problem I concluded that my constraints will be::

Row sums have to be less than 1 as he cannot leave from same city twice

Column sums have to be less than 1 as he cannot enter the same city twice

total sum of matrix elements has to be 2 as the salesman will be visiting 2 cities and leaving from 2 cities.

I need help to define A_ (my constraint matrix). How should I tie in my decision variables into constraints?

 ny=c(999,9,20)
 chicago=c(9,999,11)
 LA=c(20,11,999)
 distances=cbind(ny,chicago,LA)


 dv=matrix(c("a11","a12","a13","a21","a22","a23","a31","a32","a33"),nrow=3,ncol=3)

 c_=c(distances[1,],distances[2,],distances[3,])
 signs = c((rep('<=', 7)))
 b=c(1,1,1,1,1,1,2)
 res = lpSolve::lp('min', c_, A_, signs, b,  all.bin = TRUE)
user2543622
  • 5,760
  • 25
  • 91
  • 159

2 Answers2

2

There are some problems with your solution. The first is that the constraints you have in mind don't guarantee that all the cities will be visited -- for example, the path could just go from NY to LA and then back. This could be solved fairly easily, for example, by requiring that each row and column sum to exactly one rather than at most 1 (although in that case you'd be finding a traveling salesman tour rather than just a path).

The bigger problem is that, even if we fix this problem, your constraints wouldn't guarantee that the selected vertices actually form one cycle through the graph, rather than multiple smaller cycles. And I don't think that your representation of the problem can be made to address this issue.

Here is an implementation of Travelling Salesman using LP. The solution space is of size n^3, where n is the number of rows in the distance matrix. This represents n consecutive copies of the nxn matrix, each of which represents the edge traversed at time t for 1<=t<=n. The constraints guarantee that

  1. At most one edge is traversed each step
  2. Ever vertex is visited exactly once
  3. The startpoint of the i'th edge traversed is the same as the endpoint of the i-1'st

This avoids the problem of multiple small cycles. For example, with four vertices, the sequence (12)(21)(34)(43) would not be a valid solution because the endpoint of the second edge (21) does not match the start point of the third (34).

tspsolve<-function(x){
   diag(x)<-1e10
   ## define some basic constants
   nx<-nrow(x)
   lx<-length(x)
   objective<-matrix(x,lx,nx)
   rowNum<-rep(row(x),nx)
   colNum<-rep(col(x),nx)
   stepNum<-rep(1:nx,each=lx)

   ## these constraints ensure that at most one edge is traversed each step
   onePerStep.con<-do.call(cbind,lapply(1:nx,function(i) 1*(stepNum==i)))
   onePerRow.rhs<-rep(1,nx)

   ## these constraints ensure that each vertex is visited exactly once
   onceEach.con<-do.call(cbind,lapply(1:nx,function(i) 1*(rowNum==i)))
   onceEach.rhs<-rep(1,nx)

   ## these constraints ensure that the start point of the i'th edge
   ## is equal to the endpoint of the (i-1)'st edge
   edge.con<-c()
   for(s in 1:nx){
     s1<-(s %% nx)+1    
     stepMask<-(stepNum == s)*1
     nextStepMask<- -(stepNum== s1)
     for(i in 1:nx){        
       edge.con<-cbind(edge.con,stepMask * (colNum==i) + nextStepMask*(rowNum==i))
     }
   }
   edge.rhs<-rep(0,ncol(edge.con))

   ## now bind all the constraints together, along with right-hand sides, and signs
   constraints<-cbind(onePerStep.con,onceEach.con,edge.con)
   rhs<-c(onePerRow.rhs,onceEach.rhs,edge.rhs)
   signs<-rep("==",length(rhs))
   list(constraints,rhs)

   ## call the lp solver
   res<-lp("min",objective,constraints,signs,rhs,transpose=F,all.bin=T)

   ## print the output of lp
   print(res)

   ## return the results as a sequence of vertices, and the score = total cycle length
   list(cycle=colNum[res$solution==1],score=res$objval)
}

Here is an example:

set.seed(123)
x<-matrix(runif(16),c(4,4))
x
##           [,1]      [,2]      [,3]      [,4]
## [1,] 0.2875775 0.9404673 0.5514350 0.6775706
## [2,] 0.7883051 0.0455565 0.4566147 0.5726334
## [3,] 0.4089769 0.5281055 0.9568333 0.1029247
## [4,] 0.8830174 0.8924190 0.4533342 0.8998250
tspsolve(x)
## Success: the objective function is 2.335084 
## $cycle
## [1] 1 3 4 2
## 
## $score
## [1] 2.335084

We can check the correctness of this answer by using a primitive brute force search:

tspscore<-function(x,solution){
    sum(sapply(1:nrow(x), function(i) x[solution[i],solution[(i%%nrow(x))+1]])) 
}

tspbrute<-function(x,trials){
  score<-Inf
  cycle<-c()
  nx<-nrow(x)
  for(i in 1:trials){
    temp<-sample(nx)
    tempscore<-tspscore(x,temp)
    if(tempscore<score){
      score<-tempscore
      cycle<-temp
    }
  }
  list(cycle=cycle,score=score)
}

tspbrute(x,100)
## $cycle
## [1] 3 4 2 1
## 
## $score
## [1] 2.335084

Note that, even though these answers are nominally different, they represent the same cycle.

For larger graphs, though, the brute force approach doesn't work:

> set.seed(123)
> x<-matrix(runif(100),10,10)
> tspsolve(x)
Success: the objective function is 1.296656 
$cycle
 [1]  1 10  3  9  5  4  8  2  7  6

$score
[1] 1.296656

> tspbrute(x,1000)
$cycle
 [1]  1  5  4  8 10  9  2  7  6  3

$score
[1] 2.104487

This implementation is pretty efficient for small matrices, but, as expected, it starts to deteriorate severely as they get larger. At about 15x15 it starts slowing down quite a bit:

timetsp<-function(x,seed=123){
    set.seed(seed)
    m<-matrix(runif(x*x),x,x)   
    gc()
    system.time(tspsolve(m))[3]
}

sapply(6:16,timetsp)
## elapsed elapsed elapsed elapsed elapsed elapsed elapsed elapsed elapsed elapsed 
## 0.011   0.010   0.018   0.153   0.058   0.252   0.984   0.404   1.984  20.003 
## elapsed 
## 5.565
mrip
  • 14,913
  • 4
  • 40
  • 58
  • also please put comments in your code about your main steps..right now i have to go line by line to figure out what is it doing...also in some cases i am not understanding your intuition behind doing the particular step – user2543622 Oct 08 '13 at 23:24
  • i am going through your solution and i will need some time to understand it better...but in mean time have a quick question: it doesn't look like that you are using optimization package such as lpsolve to get the solution....then how are you arriving at the solution? is you solution scalable? lets say if i have 1000 cities.. – user2543622 Oct 08 '13 at 23:53
  • Yes, it uses the lpsolve package -- the call to `lp` is three lines from the bottom of the `tspsolve` function. No, the solution is not scalable. It starts to take a long time at about 15 cities. There is no scalable solution to traveling salesman because this is an NP-complete problem. – mrip Oct 09 '13 at 13:17
  • I added some comments. Hopefully that helps. – mrip Oct 09 '13 at 13:21
  • QQ, in below call: res<-lp("min",objective,constraints,signs,rhs,transpose=F,all.bin=T), I noticed that objective is 16*4 matrix...Doesn't it need to be a matrix with a single column/row of 64 elements? Aren't there 64 decision variables? Let me know. Thx – user2543622 Oct 15 '13 at 15:54
  • You are correct: the objective needs to be a vector. However, internally in R, a matrix is just a vector with a dimension attribute. Since lp is expecting a vector, it just ignores the dimensions. It would have been safer to pass `as.vector(objective)`, because some functions behave differently with matrices versus vectors, but for `lp` the result is the same. The reason I keep `objective` as a matrix is just because it is easier conceptually to think about this way. – mrip Oct 15 '13 at 16:18
  • When i type res$solution, i get vector of 0 and 1...How do i read that vector? Vector's 2, 25, 47, 56 elements are 1, how do they signify particular path? does second element means path from city 2 to city 1? – user2543622 Oct 15 '13 at 23:00
  • See the last line: `list(cycle=colNum[res$solution==1],score=res$objval)`. `cycle` here will give you the order in which the cities are traversed. – mrip Oct 15 '13 at 23:04
  • Well, it returns a cycle, so there's no "start" -- this algorithm solves the traveling salesman tour problem. The order of the cities is given by `cycle`, whatever that is. I can't tell you just by looking at res$solution. – mrip Oct 15 '13 at 23:16
1

You can use the gaoptim package to solve permutation/real valued problems - it's pure R, so it's not so fast:

Euro tour problem (see ?optim)

 eurodistmat = as.matrix(eurodist)

 # Fitness function (we'll perform a maximization, so invert it)
 distance = function(sq)
 {
   sq = c(sq, sq[1])
   sq2 <- embed(sq, 2)
   1/sum(eurodistmat[cbind(sq2[,2], sq2[,1])])
 }

 loc = -cmdscale(eurodist, add = TRUE)$points
 x = loc[, 1]
 y = loc[, 2]
 n = nrow(eurodistmat)

 set.seed(1)

 # solving code
 require(gaoptim)
 ga2 = GAPerm(distance, n, popSize = 100, mutRate = 0.3)
 ga2$evolve(200)
 best = ga2$bestIndividual()
 # solving code

 # just transform and plot the results
 best = c(best, best[1])
 best.dist = 1/max(ga2$bestFit())
 res = loc[best, ]
 i = 1:n

 plot(x, y, type = 'n', axes = FALSE, ylab = '', xlab = '')
 title ('Euro tour: TSP with 21 cities')
 mtext(paste('Best distance found:', best.dist))
 arrows(res[i, 1], res[i, 2], res[i + 1, 1], res[i + 1, 2], col = 'red', angle = 10)
 text(x, y, labels(eurodist), cex = 0.8, col = 'gray20')
Community
  • 1
  • 1
Fernando
  • 7,785
  • 6
  • 49
  • 81