0

I want to develop an online path searching web service. It searches the path in graph G which costs minimal edge distance sum between two vertices V1,V2.

The problem is that G is incredibly large. It contains nearly 10 million edges.

If G is small enough, it may have been simple. I would rather...

  1. List up all the edges/vertices in G and it's relation to some RDB (Ex. MySQL or PostgreSQL).
  2. Implement my own web service in PHP or something which searches the shortest path in G

My own PHP script will ...

  1. Select all edges/vertices from the RDB, build G on-memory with PHP's class or imaginary array or something.
  2. Apply Dijkstra's algorithm to the on-memory G, and reply the shortest path.

This approach would not work on huge G's because of the following reasons.

  • It takes MUCH time to build on-memory G.
  • It uses lots of memory for each edge. PHP's object is smart, but is not required for now.

This means that the constructed network should be prepared on-memory before searching requests, and each vertex/edge objects should be more light-weighted.

I decided to implement this service in C. I thought it will become much more easier to implement an Apache module than implementing a concurrent, high-performance network daemon from full scratch (if there are better solutions, I would like to know about it).

WELL, WHERE SHOULD I BUILD THE ON-MEMORY G?

As you know, Apache web server is multi-threaded, multi-processing daemon. If you develop a stupid module which constructs on-memory G's for each processes, it will build nearly 100 million edges (10 same structures) on memory for 10-processed servers.

I want the module to be more smart, sharing 1 single on-memory G from each processes, no matter how many processes running.

Where do you think is the best place to build a data structure in Apache module?

Izumi Kawashima
  • 1,197
  • 11
  • 25
  • Doesn't Apache like to spawn a bunch of processes? In which case, you'd need a dozen or more separate G's (though they could be cloned from the original, they'd still eventually have to be copied). You'd need Apache to run in some threaded mode in order to avoid that – cHao Apr 15 '12 at 16:54
  • What will it be if I use apr_reslist_* or apr_shm_* or things like that? – Izumi Kawashima Apr 16 '12 at 03:52

0 Answers0