2

I have 3 distributed maps which objects have one shared property - identifier. This identifier is used as a key for one map while 2 other maps are using cluster wide global ids as a key. There's also a Map-Reduce job that is combining related by this identifier object and is storing the result into another map. The idea is to minimize inter-cluster network traffic so job is communicating only with one member where it is being executed.

The question is: do I need to do any extra action to make sure partitions of different distributed maps are physically stored on one member?

Viktor Stolbin
  • 2,899
  • 4
  • 32
  • 53

1 Answers1

4

PartitionAware will do this for you.

If you want to guarantee three objects reside in the same partition, their key classes should implement PartitionAware and return the same result from the getPartitionKey() method.

For example, to keep all members of the same family together:

public class Person implements PartitionAware, Serializable {
    private String firstName;
    private String lastName;

    public Object getPartitionKey() {
        return this.lastName;
    }

You can verify the partition with hazelcastInstance.getPartitionService().getPartition(key).getPartitionId()

Partition 0 contains the first part of each of map X, map Y, map Z. Partition 1 contains the next part, etc.

Neil Stevenson
  • 3,060
  • 9
  • 11
  • The getPartitionKey() result will later be used for hashing algorithm, is this correct? The default number of partitions is 271, I hope implementing PartitionAware does not change this number? – Viktor Stolbin Sep 14 '17 at 16:09
  • 1
    Correct, the routing key hash is modulus by the partition count to determine the partition. The property `hazelcast.partition.count` can be varied regardless of whether `PartitionAware` is used. Fuad explains it well here https://stackoverflow.com/questions/16497041/why-hazelcast-has-default-partition-count-of-271-and-what-are-the-parameters-to – Neil Stevenson Sep 15 '17 at 04:26
  • 2
    It's the partitions (that contain entries) that are shared amongst the storage members.These need to be roughly the same size to avoid imbalance. Make sure that `getPartitionKey` doesn't return the same result for every input or everything will end in the same partition and your system will be in a mess. – Neil Stevenson Sep 15 '17 at 04:32