So I'm currently working on distributing my Scala code across multiple machines for a large graph ("part 1" of the question) and am currently working with the Akka framework in the hopes of using Actors and Remoting.
I read the document here and it seems the way the example was done can be extended to do what I want to do, but I have a few concerns regarding this method...
1) How do we decide how many instances of Actors we should create? Do we have to do a trial/error thing to see which is the best, or is there some more intuitive way to go about it?
2) I am thinking of doing my task similarly to how the example was done - with a Master that spawns several Workers and communicate using case classes as messages. What I want to do is to find some metric between pairs of vertices (random walk), for all-pairs. I have a graph class that implements a method to calculate the metric given two vertices.
I will give each Worker two vertices 'u' and 'v' to calculate the metric for, and have workers return the value.
When the Master sends messages to Workers to calculate the metric, the Worker needs the graph structure - do I just do this by including the graph structure (i.e. adjacency list that is a HashMap) in the message? Will this cause any overhead by copying the graph structure each time, or do all workers just share that graph, or is there a better way to go about this?
3) Does the algorithm for calculating the metric between pairs of vertices need to be re-implemented to the extended Actor class, or is there a way for individual Actors to access the same graph structure to call the method (I guess this is similar to the question above about passing the entire graph structure as part of the message)?
Thanks! Regards, -kstruct