I am trying to implement a cluster computing system using Java and the Windows OS. I'm looking for a solution that
- is not too out of date
- is reasonably easy to install and set up
- has enough documentation to get started with the classes and methods without a previous knowledge of MPI
- is at least somewhat user friendly
This might not be possible, but it would also be nice if it was somewhat close in usage to the Java Concurrent framework.
I initially learned a bit about the Java Concurrent package and was easily able to learn to write parallel programs on my local 8-core machine using the Runnable Interface and the ExecutorService, making all my classes threadsafe in the process. However, I have yet to find a standard mechanism to extend this programming framework to clusters.
I then learned of a GitHub project called Java-Interop-Library (https://github.com/MicrosoftHPC/Java-Interop-Library ) that could be used with Microsoft HPC Pack. I networked a few cloud computer via Amazon EC2 and installed the HPC Pack. The Java-Interop-Library was a nightmare to compile and set up. I had to manually edit several batch files and even some Java code to get it compiled. By the time I got most of it working (but not everything), I just started thinking that there had to be an easier way, and I started searching again.
My new search led me to MPJ-Express (http://mpj-express.org). I read through the documentation on the site, and it seems easy to set up. They even have documentation on how to integrate it with Eclipse and debug. But, I could never find any documentation on how the classes and methods are actually used (there's a simple hello world example, but it's not close to enough).
More searching led me to MPIJava, Hadoop, and GridGain. Having no experience with MPI or MPJ, and knowing that MPJ grew out of MPIJava, I started trying to find documentation for that instead. I found some docs, but some of it is quite old and I'm not really sure I'm on the right track. I saw the mention of GridGain on another StackOverflow post, and went to their website. They seem to have a cluster computing framework, and a simple posted example even uses what appear to be classes that use Runnable objects, which seemed attractive to me having had some experience with the Java Concurrent framework. I don't really know anything about Hadoop, other than that it might be a possibility.
I really just need some better direction on the best way to accomplish cluster computing i Java. I feel like I'm just spinning my wheels.