The minimal working example below presents some (simplified) processing where:
- each event is processed individually (no windowing),
- each event belongs to a certain group,
- each event updates a group state, which then is used to generate some output value.
public class IMapExample {
public static void main(String[] args) {
JetInstance jet = Jet.newJetInstance();
IMap<Long, Double> groups = jet.getMap("groups");
Pipeline p1 = Pipeline.create();
p1.readFrom(TestSources.itemStream(10))
.withoutTimestamps()
.writeTo(Sinks.mapWithUpdating(groups,
event -> event.sequence() % 10, //simulate 10 groups
(oldState, event) -> event.sequence() + (oldState != null ? oldState : 0.0) //update group state with given event
));
Pipeline p2 = Pipeline.create();
p2.readFrom(Sources.mapJournal(groups, START_FROM_OLDEST))
.withIngestionTimestamps()
.map(x -> x.getKey() + " -> " + x.getValue()) //map group state to some output value
.writeTo(Sinks.logger());
jet.newJob(p2);
jet.newJob(p1).join();
}
}
Given the above example, does Hazelcast Jet preserve Data Locality? In the sense that the code updating the state of a group should be invoked on the same node where the state of given group is located.
Follow-up question: If the StreamSource were replaced by Kafka partitioned by the same groups, would that partitioning be preserved and correlate with Data Locality of the pipeline?