0

I am evaluating Terracotta for my current problem statement. The process is CPU intensive and takes about 5-10 GB working memory(RAM). Each object in memory is 1 kilobyte fine and consists of a handful of primitive data types. The whole RAM data goes through thousands of iterations and each iteration changes all the objects. Each object is modified completely. The process takes days to finish.

The million+ objects are partitioned and now run on multiple core machines, but i need more power and much more RAM(for bigger problems). The data/objects processed by one thread is not shared with others

Would Terracota be a good solution? Would syncing up of the million of objects to the clustering server be a really bad bottleneck rendering it ineffective?

1 Answers1

0

I think Terracotta is best suited for caching and fast retrieval. As a put rate I've seen 10K "batched puts" per second rates per Cache server instance. "batch update" mode means that you can put a collection of entries in one shot which will be much more efficient versus single put.

Here is an example of the batched update:

cache.setNodeBulkLoadEnabled(true);
try
{
  Collection<Element> entries= new ArrayList<Element>();
  while (...)
  {
    entries.add(new Element(key, value));
  }
  cache.putAll(entries);
}
finally
{
  cache.setNodeBulkLoadEnabled(false);
}

Also, Terracotta has BigMemory feature that capable using memory outside of JVM heap. To enable it you have to add in your ehcache.xml:

<cache name="com.xyz.MyPOJO" maxMemoryOffHeap="3g">
  <terracotta/>
</cache>

the example above will use 3Gig of Ram outside of your JVM. In general you should not have heap size bigger than 4G, otherwise your JVM will be spending tons of cycles on GC...which in your case slows your calculations even further.

Another alternative to check is "Computing/Data Grid" solutions. You can start from http://www.gridgain.com and http://www.gigaspaces.com

user1697575
  • 2,830
  • 1
  • 24
  • 37