12

I recently came across a case in interview when use case which was asked to be solved belongs to travelling salesman problem / vehicle routing problem. I was able to tell them what the actual problem is and what maths is involving in the problem. I did explained how below mentioned use case can also be solved using MapReduce paradigm part of Hadoop. ( explained how multiple map reduce jobs will be able to solve the problem ) using Graph algorithm mentioned in this book Data-Intensive Text Processing with MapReduce" by Jimmy Lin and Chris Dyer.

Out of curiosity I did some research on google and i can see lot of implementation and research has been done for this problem in different flavors. Problem i was asked has coordinates of city mentioned in (x,y) format and many solutions i saw on google consider some other factors like unit distance, negative/positive units of measurement and so on. So in short more i did research and reading i got more confused.

My question here is for below use case what can be possible solutions and what will be best solution among them. If some experienced person can put some lights on this it will be helpful to clear my confusion and understanding the solution in better way. or if someone can direct me to right direction ( so that i don't get more confused exploring whole ocean of solutions )

Use case asked in interview:

A company is trying to find best possible optimal solution for servicing his customer base of 300 with 12 employee. They want a technology solution that tells how they will be able to meet customer requirement as business will grow and other changes like location of customer changes, new locations added and so on.

Problem is basically a form of Travelling Salesman Problem ( TSP ) or Vehicle Routing problem ( VSP ). Following things need to be completed here.

Starting coordinates are (0,0) and city coordinates example are mentioned below. Here are coordinates with which working solution is expected provided in a text file as input:

X coordinate    Y Coordinate 
420 278 
421 40 
29  178 
350 47 
298 201 
417 186 
378 134 
447 239 
42  114 
45  199 
362 195 
381 243 
429 1 
338 209 
176 9 
364 26 
326 182 
500 129 
190 51 
489 103 
368 142 
132 260 
305 200 
446 137 
375 154 
440 190 
9   118 
437 32 
383 266 
  1. What can be right way to handle this NP-hard problem or if not right way what can be different approaches with their pros/cons.

  2. Since its more of analysis based problem can some kind of visualization be done to solve this. Like some graph or use of R/analytic tools

Let me know if you need more details or if you can suggest where i can read and understand more.

Thanks in advance

user1188611
  • 945
  • 2
  • 14
  • 38
  • I'm not the expert that you're looking for, thus I would not dare to post this oversimplified comment as an answer. Basically, you could describe paths between your coordinates and then find a Hamiltonian cycle. Many common libraries can calculate those cycles e.g. [igraph](http://stackoverflow.com/questions/26557533/hamiltonian-path-using-igraph) (I don't know for hadoop though). [This question](http://stackoverflow.com/questions/16115942/finding-all-hamiltonian-cycles) refers to a solution in java. Hope it helps. – lrnzcig Nov 09 '15 at 08:10
  • The number of employees may be a hint that they wanted multiple places of business discussed. `best possible` as well as `optimal` need a goal and some cost function. – greybeard Nov 17 '15 at 11:04

4 Answers4

3

There's no right way of solving a NP problem. Since complexity is exponential it's going to take a very long time for anything other than trivial examples.

However, there are approximations that can get fairly close to the real answer and might be sufficiently good for your application (as in, it's not the shortest path, but it's within some relative range of it).

Check out the wikipedia page. They even have some examples.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
Sorin
  • 11,863
  • 22
  • 26
  • As far as I understand, topic started has a mTSP problem. (m - multiple) and this salesmen start from different points (from their homes or home offices for example). In this situation we definitely doesn't speak about best solution with our size of this problem and need some approximations. Wiki page contains only more or less classical formulations/solutions. – Dmitry Spikhalskiy Nov 21 '15 at 11:13
  • I have clearly stated what you are trying to say. My question is does any one can suggest best possible solution here. – user1188611 Nov 21 '15 at 20:06
2

If I would asked this question on interview - I will propose something like described in this paper, looks like best match for your task's formulation. In this paper you will find optimized approximate approach to solve multiple salesmen problem with all salesmen starting in one point. It can be adopted if we know where employees leave by solving each single travel salesman subproblem (clustering divides main problem to classic problems) with start at specific salesman's home/home office.

If we have graph of places as an input, not just coordinates - we can replace k-means with graph clustering algorithm like MCL.

Dmitry Spikhalskiy
  • 5,379
  • 1
  • 26
  • 40
  • same comment as above for other answers, can you suggest answer to question asked not suggest modification to question and then answer – user1188611 Nov 21 '15 at 20:09
  • "A company is trying to find best possible optimal solution for servicing his customer base of 300 with 12 employee" + mentioned that it's variation of TSP. We have multiple "salesmen", which are employees of the company. Them, which is logically, has one or multiple "home offices". They need to visit points according to model of TSP. So, we have classical mTSP problem or mTSP with different start points for salesmen from you description. It's just an asked question. +if input is distances(not coordinates) - I suggested MCL for graphs. I didn't suggest any modifications for problem definition. – Dmitry Spikhalskiy Nov 22 '15 at 14:00
  • @user1188611 In general - I really don't understand your comment. It's just a selection of formal version for your problem to provide you related papers. BTW, fortunately, that this strong formulation (mTSP) perfectly matches your description :) – Dmitry Spikhalskiy Nov 22 '15 at 14:18
2

Indeed as Dmitry mentions this is a case of the multiple travelling salesman porblem. Being NP-hard naturally the interviewers are looking for you to suggest a heursitic optimisation algorithm.

I think the key in this case is they are looking for an algorithm which is able to update in real time to changes in the number and location of destinations. Ant colony optimisaiton (a form of particle swarm optimisation) was actually initially formulated for the travelling salesman problem, see the paper and wikipedia:

https://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms

"M. Dorigo, V. Maniezzo, et A. Colorni, Ant system: optimization by a colony of cooperating agents, IEEE Transactions on Systems, Man, and Cybernetics--Part B , volume 26, numéro 1, pages 29-41, 1996."

This has been generalised since to the multiple travelling saleman problem see for example this paper (opensource) for some nice work into it:

http://www.researchgate.net/publication/263389346_Multi-type_ant_colony_system_for_solving_the_multiple_traveling_salesman_problem

In an interview situation, I would detail it has the Pros as: 1. being an efficient heuristic solution; 2. Able to update in real time to both changes in the graph; 3. For bonus points I mention that, once a reasonably efficient solution has been obtained in silico, drivers themselves could be assigned routes in a slightly probabilisitc way, subsequently optimisation driven by real data could be performed.

Cons are that reasonably large amounts of proccessing power are likely required compared to say problems that first reduce the search space as Dmitry suggested. Secondly if they want you to actually draw up an alogirthm this could be quite challenging in the space of an interview.

Interesting question :)

samocooper
  • 82
  • 6
  • you are giving me English version of what i have asked in my question above. My question clearly says if you can suggest different approaches to problem not different approaches to answer question in interview which i do not know. – user1188611 Nov 21 '15 at 20:08
0

I' am no expert but couldn't you just calculate the distance between the origin and all other points and find the nearest point, then repeat the process for that point until you have covered every point?

Paul
  • 670
  • 7
  • 19
  • No. For some examples you can get horrible results. You can end up going back and forth around the same point, just because the next one is a bit further away than the one on the opposite side. – Sorin Nov 18 '15 at 15:18
  • could you then just calculate all possible paths and take the one with the least distance? – Paul Nov 18 '15 at 15:25
  • Yes, but you have `n!` possible paths. – Sorin Nov 18 '15 at 15:27
  • Then I guess you would have to 'split the work' by using multiple threads each calculating the possibilities form the third movement. I say from the third on because it would be easier to split up the work for the different threads. Also, as soon as a path exceeds the smallest distance so far, you should stop and go on to the next path. – Paul Nov 18 '15 at 15:34
  • 2
    Sure. Let's say you have a mamoth server with 128 cores just for this. They you are going to look at n! / 128. 15! = 1'307'674'368'000 or 1.3 Trillion things to check. If you check 1 million per second you still need to wait 15 days for it to finish. If you add just one more destination you need to wait 240 days for it to finish. There's no way you can optimize it enough to cover 100 destinations (and 100 is a small number in many cases). – Sorin Nov 18 '15 at 15:39
  • well, you could use a combination of the 2 ideas. You could first find the closest 2 points then start from each of them and find the nearest points and repeat. This would reduce the number of calculations and some what prevent backtracking. If you were to change 2 to a larger number, then it would execute slower but, it would be more accurate. – Paul Nov 18 '15 at 16:57