1

I have 2750 city centers in Belgium. I need to know the distances between every 2 city centers. But that results in an matrix of 57MB, just to remember those distances (not even the routes), so that scales terribly.

Instead, I am looking at using Highway intersections as hubs. Basically, every city knows it's nearby cities and it's nearby hubs (= highway intersection). All hubs know the distance to each other.

So the distance from 1 city A to another non-nearby city B, can be calculated by the distance of cityA -> hubX -> hubY -> cityB. Because most cities have typically 3 hubs nearby, I might need to look at all 9 combinations and take the shortest. But in any case it should scale better memory wise.

Now the problem: Can I describe a highway intersection as a single point? Think about it: a highway consist of 2 roads (one in both direction), so a highway intersection center has 4 roads (not even counting the arms).

Geoffrey De Smet
  • 26,223
  • 11
  • 73
  • 120

1 Answers1

1

Some ideas:

  1. you can store those distances off-heap or on-disc via MapDB or GraphHopper its simplistic DataAccess implementations making it RAM-independent
  2. you can use float which should be only ~30MB or even short and just use the kilometers
  3. you could try on demand routing, without storing, as it takes only a few ms to calculate a route. Disable instructions and calculating points makes it even twice as fast. You could even disable calculating the distance and just use path.weight - this will give you another good speedup but requires a bit lower level GraphHopper usage and is only recommended if you know what you do.

Now to your question. GraphHopper uses a graph model consisting of nodes (junctions) and edges (streets connecting junctions). Still a roundabout consists of multiple nodes. But in general it should be possible to use such a 'leaving' node as 'hub-id'.

I see two approaches to calculate those nodes:

  • either by running the Contraction-Hierarchy and picking the highest 1000 nodes and define them as hubs - this would be similar to what is described in the 'transit node routing' paper
  • or you calculate routes from one city to e.g. all other cities (or just 8 geographic directions) and find the last common nodes of two routes to identify some

For both approaches you'll have to digg a bit deeper into GraphHopper and you'll probably need the lower level API.

Karussell
  • 17,085
  • 16
  • 97
  • 197
  • First why the idea's won't work in my specific case (although they are good advice otherwise): Going out to disk is too slow (including SSD). A few ms per distance calculation is way too slow during solving (not during precalculation). I tried 32 bit numbers (floats are also 32 bit) and 20k locations uses almost 2GB RAM: `(20k)²*4` is just that big. I want to scale up to 100k locations. – Geoffrey De Smet Sep 04 '14 at 12:02
  • Contraction-Hierarchy might not work well in my case, because the hub nodes aren't points in my dataset: the points are city centers, the hub nodes are highway related. So that leaves... – Geoffrey De Smet Sep 04 '14 at 12:04
  • ... calculating routes and find the last common nodes. I like that idea a lot :) It's just not that straightforward to implement. But it does save me from hand picking hub locations :) – Geoffrey De Smet Sep 04 '14 at 12:06
  • 1
    If you use shorts then 100K² would be only 19GB ;) and I can give you a dev server which has 32gb for that trial. Or you could also try a kind of a delta coding for one row of the matrix and 'decompress' the row on demand. You said 'Going out to disk is too slow (including SSD)' - did you test this? – Karussell Sep 04 '14 at 13:17
  • Do you have a kind of an access schema like mostly in one row, then another row etc or is it completely random access? – Karussell Sep 04 '14 at 13:19
  • When it solves with OptaPlanner: Currently completely random access, about 3 or 6 distances per move, for about 200 000 moves per second. And it needs to work on a normal machine (no SSD, 8 GB RAM, JDK 6). Ouch :) But the GraphHopper stuff is a separate, precalculation to generate a dataset. So I do have more flexibility as regards to machine etc. I'll do a blog to explain the problem better later (it's a bit hard to explain it here). – Geoffrey De Smet Sep 05 '14 at 06:29
  • Shorts are too small, and 38GB is 34GB too much ;) I am working on segmenting the matrix and your suggestion on the low level api of GraphHopper are probably going to help me do that. – Geoffrey De Smet Sep 05 '14 at 06:34
  • 1
    shorts are not too small, if you use e.g. like 100m per unit or even just 1km, completely sufficient for max length in USA – Karussell Sep 05 '14 at 09:56