Dijkstra-based algorithm optimization for caching

Question

I need to find the optimal path connecting two planar points. I'm given a function that determines the maximal advance velocity, which depends on both the location and the time.

My solution is based on Dijkstra algorithm. At first I cover the plane by a 2D lattice, considering only discrete points. Points are connected to their neighbors up to specified order, to get sufficient direction resolution. Then I find the best path by (sort of) Dijkstra algorithm. Next I improve the resolution/quality of the found path. I increase the lattice density and neighbor connectivity order, together with restricting the search only to the points close enough to the already-found path. This may be repeated until the needed resolution is achieved.

This works generally well, but nevertheless I'd like to improve the overall algorithm performance. I've implemented several tricks, such as variable lattice density and neighbor connectivity order, based on "smoothness" of the price function. However I believe there's a potential to improve Dijkstra algorithm itself (applicable to my specific graph), which I couldn't fully realize yet.

First let's agree on the terminology. I split all the lattice points into 3 categories:

cold - points that have not been reached by the algorithm.
warm - points that are reached, but not fully processed yet (i.e. have potential for improvement)
stable - points that are fully processed.

At each step Dijkstra algorithm picks the "cheapest" warm lattice point, and then tries to improve the price of its neighbors. Because of the nature of my graph, I get a kind of a cloud of stable points, surrounded by a thin layer of warm points. At each step a warm point at the cloud perimeter is processed, then it's added to the stable cloud, and the warm perimeter is (potentially) expanded.

The problem is that warm points that are consequently processed by the algorithm are usually spatially (hence - topologically) unrelated. A typical warm perimeter consists of hundreds of thousands of points. At each step the next warm point to process is pseudo-randomal (spatially), hence there's virtually no chance that two related points are processed one after another.

This indeed creates a problem with CPU cache utilization. At each step the CPU deals with pseudo-random memory location. Since there's a large amount of warm points - all the relevant data may not fit the CPU cache (it's order of tens to hundreds of MB).

Well, this is indeed the implication of the Dijkstra algorithm. The whole idea is explicitly to pick the cheapest warm point, regardless to other properties.

However intuitively it's obvious that points on one side of a big cloud perimeter don't make any sense to the points on another side (in our specific case), and there's no problem to swap their processing order.

Hence I though about ways of "adjusting" the warm points processing order, yet without compromising the algorithm in general. I thought about several ideas, such as diving the plane into blocks, and partially solving them independently until some criteria is met, meaning their solution may be interfered. Or alternatively ignore the interference, and potentially allow the "re-solving" (i.e. transition from stable back to warm).

However so far I could not find rigorous method.

Are there any ideas how to do this? Perhaps it's a know problem, with existing research and (hopefully) solutions?

Thanks in advance. And sorry for the long question.

Are there any other constraints on the path that can be taken? For example, if you knew that the x velocity was always positive you would know that all the points with the same x were independent and could be determined in a way that was nice to the cache. — Peter de Rivaz, Mar 10 '12 at 20:35
@Peter de Rivaz: At any lattice points it's allowed to move in any direction. (if this is what you meant) — valdo, Mar 10 '12 at 20:46
can you adapt the parallel diskstra http://www.cs.rice.edu/~vs3/comp422/lecture-notes/comp422-lec24-s08-v2.pdf so that you find the minimum in spatially nearby blocks (sequentially in your case)? your main loop is still global, but the minimisation is at least local (this may be a dumb suggestion, just thinking out loud). — andrew cooke, Mar 10 '12 at 21:03
@andrew cooke: Thanks for the suggestion. It doesn't actually help, because finding the cheapest *warm* point (i.e. the "logarithmic" component of the complexity) is definitely not my current bottleneck. OTOH each step involves locating neighbor lattice points via a sort-of a quad tree, plus their context information. All this leads to a large amount of memory read at every step, which could be avoided if the cache was properly utilized. — valdo, Mar 10 '12 at 21:37

score 5 · Accepted Answer · answered Mar 11 '12 at 00:18

What you're describing is the motivation behind the A* search algorithm, a modification of Dijkstra's algorithm that can dramatically improve the runtime by guiding the search in a direction that is likely to pick points that keep getting closer and closer to the destination. A* never does any more work than a naive Dijkstra's implementation, and typically tends to expand out nodes that are clustered on the frontier of the warm nodes that are closest to the destination node.

Internally, A* works by augmenting Dijkstra's algorithm with a heuristic function that estimates the remaining distance to the target node. This means that if you get can get a rough approximation of how far away a given node is from the destination, you can end up ignoring nodes that don't need to be processed in favor of nodes that are likely to be better.

A* was not designed as a cache-optimal algorithm, but I believe that the increase in speed due to

Expanding fewer nodes (less in the cache), and
Expanding nodes closer to the goal (which were processed more recently and thus more likely to be in the cache)

will give you a huge performance increase and better cache performance.

Hope this helps!

A lot of thanks for your answer, and sorry for the delayed response. I just took time to realize the algorithm, and how to adopt it to my case. Ths A* algorithm actually gives more-or-less what I asked for: a strict criteria of when it's possible to swap the point processing order. The criteria also may be slightly violated to achieve even better performance, in expense of some potential optimality loss. Not finished with this yet, but meanwhile this seems very promising — valdo, Mar 14 '12 at 15:14

Dijkstra-based algorithm optimization for caching

1 Answers1