Using Akka Actors/Remoting to distribute graph algorithms across a cluster

Question

So I'm currently working on distributing my Scala code across multiple machines for a large graph ("part 1" of the question) and am currently working with the Akka framework in the hopes of using Actors and Remoting.

I read the document here and it seems the way the example was done can be extended to do what I want to do, but I have a few concerns regarding this method...

1) How do we decide how many instances of Actors we should create? Do we have to do a trial/error thing to see which is the best, or is there some more intuitive way to go about it?

2) I am thinking of doing my task similarly to how the example was done - with a Master that spawns several Workers and communicate using case classes as messages. What I want to do is to find some metric between pairs of vertices (random walk), for all-pairs. I have a graph class that implements a method to calculate the metric given two vertices.

I will give each Worker two vertices 'u' and 'v' to calculate the metric for, and have workers return the value.

When the Master sends messages to Workers to calculate the metric, the Worker needs the graph structure - do I just do this by including the graph structure (i.e. adjacency list that is a HashMap) in the message? Will this cause any overhead by copying the graph structure each time, or do all workers just share that graph, or is there a better way to go about this?

3) Does the algorithm for calculating the metric between pairs of vertices need to be re-implemented to the extended Actor class, or is there a way for individual Actors to access the same graph structure to call the method (I guess this is similar to the question above about passing the entire graph structure as part of the message)?

Thanks! Regards, -kstruct

Tomasz Nurkiewicz · Accepted Answer · 2012-07-14T19:08:19.577

1) How do we decide how many instances of Actors we should create?

Although actors abstract the underlying threading management, creating less actors than available CPU cores is wasting the computational power. If you have 10 servers, 8 cores each, create at least 80 actors, 8 per machine.

If the algorithm is CPU intensive, creating more won't give you a performance boost - extra workers will simply wait for available core.

[...] Worker needs the graph structure - do I just do this by including the graph structure (i.e. adjacency list that is a HashMap) in the message? [...]

There is no overhead if all your actors live in the same JVM - you are simply passing a reference to the graph structure in a message. However in distributed environment this will cause the graph to be serialized and sent over wire - probably a lot of data.

Consider sharing this data structure by all actors.

I don't understand question 3.

"passing a reference to the graph structure in a message" doesn't that violate the immutability benefits? are you referring to having the actors all race to use the same structure? — James, Jul 14 '12 at 18:54
@James: I assumed graph data structure is immutable and known at the very beginning. Then all of them can safely access the same object. If it is being changed throughout the processing, then you are right, it will cause headaches and simply won't work. — Tomasz Nurkiewicz, Jul 14 '12 at 19:09

Using Akka Actors/Remoting to distribute graph algorithms across a cluster

1 Answers1