I want to develop an online path searching web service. It searches the path in graph G which costs minimal edge distance sum between two vertices V1,V2.
The problem is that G is incredibly large. It contains nearly 10 million edges.
If G is small enough, it may have been simple. I would rather...
- List up all the edges/vertices in G and it's relation to some RDB (Ex. MySQL or PostgreSQL).
- Implement my own web service in PHP or something which searches the shortest path in G
My own PHP script will ...
- Select all edges/vertices from the RDB, build G on-memory with PHP's class or imaginary array or something.
- Apply Dijkstra's algorithm to the on-memory G, and reply the shortest path.
This approach would not work on huge G's because of the following reasons.
- It takes MUCH time to build on-memory G.
- It uses lots of memory for each edge. PHP's object is smart, but is not required for now.
This means that the constructed network should be prepared on-memory before searching requests, and each vertex/edge objects should be more light-weighted.
I decided to implement this service in C. I thought it will become much more easier to implement an Apache module than implementing a concurrent, high-performance network daemon from full scratch (if there are better solutions, I would like to know about it).
WELL, WHERE SHOULD I BUILD THE ON-MEMORY G?
As you know, Apache web server is multi-threaded, multi-processing daemon. If you develop a stupid module which constructs on-memory G's for each processes, it will build nearly 100 million edges (10 same structures) on memory for 10-processed servers.
I want the module to be more smart, sharing 1 single on-memory G from each processes, no matter how many processes running.
Where do you think is the best place to build a data structure in Apache module?