Let's start with a higher-order idea: we can't efficiently find the best solution for the problem because of two reasons:
- We can't encode all real life considerations into our target function
- Even if we could, the problem will be computationally very hard.
The answer to both of these challenges in the article is to build a number of "good enough" solutions using some very basic target functions (distance) so that people than can decide among them using some other considerations. To make it work we want the solutions to be reasonably close to the best and reasonably versatile. What they do is effectively using "probabilistic greedy algorithm" (I just made up this term now, not sure if there is an official name for that).
The idea is that we want to build each set of best roads by adding one road at a time. In a true greedy algorithm at each step we would add just the shortest road among all roads that connect some new blocks. Unfortunately in that way we can get only one solution and also greedy algorithms often work not that well in such a complicated problem (sometimes you need to select a longer road now to win by adding many shorter roads at the end). So what they do instead is:
Generate a set of potentially good candidates for next road: for each not connected yet block generate a few of the shortest roads.
Get a random road from that set of the candidates. But still it is clear that not every candidate is the same and we want to prefer the shorter ones (i.e. pick them more often). To account for that let's add weight to each road that is inversely proportional to its length and then take the weighted average. In practice they use weight Wi = Li^(-n)
for n = 8
. This is a pretty hard filter selecting the shorter paths (if the road is 20% longer it is 4+ times less likely to be picked). So probability of the given road to be taken is
Pi = Wi/Sum(Wj) = Li^(-n)/Sum[Lj^(-n)]
When the road is added you have a new smaller problem so you can repeat all the steps again.