Java In-memory Distributed Linked List

Question

I have a requirement to have data In memory and distributed across nodes, I could see Hazelcast and Apache Ignite support JCache and Key value pairs. But distributed by its own algo (like Hashing)

My requirement is data(element) should be sorted by timestamp(One of the fields in the Java Data object) and partitioned in Heap as a List (like Distributed Linked List)

Ex: Let's say we have 4 Nodes.

List 1 on Node 1 -> element(1), element(2), element(3). 
List 2 on Node 2 -> element(4), element(5), element(6).
List 3 on Node 3 -> element(7), element(8), element(9).
List 4 on Node 4 -> element(10), element(11), element(12). ```

element (n) transaction time < element (n+1) transaction time

The goal is to run Algo in memory on each node on the local data without network call.

score 0 · Answer 1 · answered Jul 18 '22 at 10:38

0

For Hazelcast, you probably want near-cache.

This lets the system distribute the data the way it should, but each node can keep a local copy of the data it is using.

You can override the distribution algorithm if you wish certain pieces of data to be kept together. However, trying to control where that is stops a distributed system from rebalancing the data to even out load.

answered Jul 18 '22 at 10:38

Neil Stevenson

3,060
9
11

Thanks for your response. with near-cache we have two issues 1) Make network calls that we want to avoid for each key which is imp constraints 2) Increase the memory footprint (on client side). – AneeshMohan0 Jul 18 '22 at 11:05
Can you please give me a reference to docs to override the distribution algorithm in hazelcast. – AneeshMohan0 Jul 18 '22 at 11:10
See [PartitionAware](https://docs.hazelcast.com/hazelcast/5.1/performance/data-affinity#partitionaware) for co-location. – Neil Stevenson Jul 19 '22 at 10:04

score 0 · Answer 2 · answered Jul 19 '22 at 08:07

0

In addition to Neil's near-cache advice, you should also look into the Distributed Computing section within the Finding the Right tool chapter in Hazelcast documentation. There are 3 ways to proceed:

Hazelcast Jet stream & batch engine - your pipelines (jobs) can also process data locally;
ExecutorService - allows you to execute your code on cluster members;
EntryProcessor - allows your code to process IMap entries locally on members.

answered Jul 19 '22 at 08:07

kwart

3,154
1
21
22

this solution still doesn't help as we will have network overhead for each call. we want in-memory processing of data distributed in time series like 00-08 Hours in Node1, 09-16 in Node-2, and 17-24 in Node-3. – AneeshMohan0 Jul 20 '22 at 20:01

Java In-memory Distributed Linked List

2 Answers2