Does Hazelcast Jet mapWithUpdating preserve Data Locality

Question

The minimal working example below presents some (simplified) processing where:

each event is processed individually (no windowing),
each event belongs to a certain group,
each event updates a group state, which then is used to generate some output value.

public class IMapExample {
    public static void main(String[] args) {
        JetInstance jet = Jet.newJetInstance();
        IMap<Long, Double> groups = jet.getMap("groups");

        Pipeline p1 = Pipeline.create();
        p1.readFrom(TestSources.itemStream(10))
                .withoutTimestamps()
                .writeTo(Sinks.mapWithUpdating(groups,
                        event -> event.sequence() % 10, //simulate 10 groups
                        (oldState, event) -> event.sequence() + (oldState != null ? oldState : 0.0) //update group state with given event
                ));

        Pipeline p2 = Pipeline.create();
        p2.readFrom(Sources.mapJournal(groups, START_FROM_OLDEST))
                .withIngestionTimestamps()
                .map(x -> x.getKey() + " -> " + x.getValue()) //map group state to some output value
                .writeTo(Sinks.logger());

        jet.newJob(p2);
        jet.newJob(p1).join();
    }
}

Given the above example, does Hazelcast Jet preserve Data Locality? In the sense that the code updating the state of a group should be invoked on the same node where the state of given group is located.

Follow-up question: If the StreamSource were replaced by Kafka partitioned by the same groups, would that partitioning be preserved and correlate with Data Locality of the pipeline?

Does Hazelcast Jet mapWithUpdating preserve Data Locality

0 Answers0