0

I have a requirement to have data In memory and distributed across nodes, I could see Hazelcast and Apache Ignite support JCache and Key value pairs. But distributed by its own algo (like Hashing)

My requirement is data(element) should be sorted by timestamp(One of the fields in the Java Data object) and partitioned in Heap as a List (like Distributed Linked List)

Ex: Let's say we have 4 Nodes.

List 1 on Node 1 -> element(1), element(2), element(3). 
List 2 on Node 2 -> element(4), element(5), element(6).
List 3 on Node 3 -> element(7), element(8), element(9).
List 4 on Node 4 -> element(10), element(11), element(12). ```

element (n) transaction time < element (n+1) transaction time 

The goal is to run Algo in memory on each node on the local data without network call.  

2 Answers2

0

For Hazelcast, you probably want near-cache.

This lets the system distribute the data the way it should, but each node can keep a local copy of the data it is using.

You can override the distribution algorithm if you wish certain pieces of data to be kept together. However, trying to control where that is stops a distributed system from rebalancing the data to even out load.

Neil Stevenson
  • 3,060
  • 9
  • 11
  • Thanks for your response. with near-cache we have two issues 1) Make network calls that we want to avoid for each key which is imp constraints 2) Increase the memory footprint (on client side). – AneeshMohan0 Jul 18 '22 at 11:05
  • Can you please give me a reference to docs to override the distribution algorithm in hazelcast. – AneeshMohan0 Jul 18 '22 at 11:10
  • See [PartitionAware](https://docs.hazelcast.com/hazelcast/5.1/performance/data-affinity#partitionaware) for co-location. – Neil Stevenson Jul 19 '22 at 10:04
0

In addition to Neil's near-cache advice, you should also look into the Distributed Computing section within the Finding the Right tool chapter in Hazelcast documentation. There are 3 ways to proceed:

kwart
  • 3,154
  • 1
  • 21
  • 22
  • this solution still doesn't help as we will have network overhead for each call. we want in-memory processing of data distributed in time series like 00-08 Hours in Node1, 09-16 in Node-2, and 17-24 in Node-3. – AneeshMohan0 Jul 20 '22 at 20:01