4

I'm new to java ee and I'm not sure how to implement a particular requirement.

I have need for a large set (millions) of objects that maintain a bunch of rules and state and present an API for clients. Each of these objects is long lived. Given that there're so many of these things, we'll likely need to shard them across many VMs and use RMI to access them.

My question is whether there's a Java EE approach to solve the problem of locating an instance of the object, allowing the clients to get a reference to the object without needing to be aware of which machine it is on.

I'm aware of JNDI, but I'm not sure that registering each of the objects in a JNDI directory is appropriate. Do I need to write a "Locator" library that can make itself aware of the VM that each object belongs to?

Royce
  • 532
  • 3
  • 11
  • Have you considered available solutions such as [Terracotta](http://www.terracotta.org) and [Coherence](http://www.oracle.com/technetwork/middleware/coherence/overview/index.html)? Or, do you have any specific/statutory requirements that preclude such solutions? – Alistair A. Israel Aug 17 '11 at 04:28
  • @AlistairIsrael I've got nothing against them, however they only seem to help with data distribution in the objects. The compute time needs to be sharded as well. These solutions *could* be useful for distributing the registry, though. OTOH something like Hazelcast would be a simple solution for registry distribution too. – Royce Aug 17 '11 at 04:43
  • I've could've sworn Terracotta provided transparent clustering making it appear to your app that it was running on a huge JVM. Their Web site doesn't make it clear now that they still do that, or, maybe that's what [Terracotta DSO](http://www.terracotta.org/confluence/display/docs/Home) is all about. – Alistair A. Israel Aug 17 '11 at 05:46

2 Answers2

2

Without more specific details, let me venture forth several avenues for exploration.

If I'm reading you correctly, what you want is something akin to a DHT but for hosting and looking up objects (code+data) or service nodes, not just raw data. I'm not aware of any such platform, though it sure sounds like an interesting idea.

Java EE itself (as a spec) doesn't specify, nor does the reference implementation provide an "out of the box" solution for the massively distributed clustering & sharding I think you're looking for.

Glassfish (the Java EE RI) itself uses Shoal as a clustering framework, which can use either Grizzly or JGroups as the underlying group communications platform.

So—in your particular case, I would look into building out and upon JGroups for group communications. Then, instead of a central registry, we rely on DHT for service/object location. Look at how existing, successful DHT-based platforms (memcached, Apache Cassandra) implement partitioning & lookup, fault-tolerance and failover, and just adapt/adopt those. Then you can use RMI/RPC for client-server (service node) invocations.

Hope I'm making sense, and good luck! If you do roll this out yourself, see if you can open source it. ;)

Alistair A. Israel
  • 6,417
  • 1
  • 31
  • 40
0

I may not directly answer you question, but I know that Oracle Coherence can not just distribute data, but can also distribute calculation against that data.

Simple example code here. You write your calculation through a class that implements com.tangosol.uti.InvocableMap.EntryProcessor. This will enable calculation to take place at the server where the data exists. One restiction is that the data needs to be serializable because it moves through the network.

public class CalcLogic implements EntryProcessor {
....
    //InvocableMap.Entry is the "data"
    //You write your calculation in this process methods.
    public Object process(InvocableMap.Entry entry {
        (YourObjectType) obj = (YourObjectType)entry.getValue();
        //do some calculation against obj here
        entry.setValue(obj);   
        return null;
    }

....
}
user865871
  • 111
  • 1
  • 3