1

The minimal working example below presents some (simplified) processing where:

  • each event is processed individually (no windowing),
  • each event belongs to a certain group,
  • each event updates a group state, which then is used to generate some output value.
public class IMapExample {
    public static void main(String[] args) {
        JetInstance jet = Jet.newJetInstance();
        IMap<Long, Double> groups = jet.getMap("groups");

        Pipeline p1 = Pipeline.create();
        p1.readFrom(TestSources.itemStream(10))
                .withoutTimestamps()
                .writeTo(Sinks.mapWithUpdating(groups,
                        event -> event.sequence() % 10, //simulate 10 groups
                        (oldState, event) -> event.sequence() + (oldState != null ? oldState : 0.0) //update group state with given event
                ));

        Pipeline p2 = Pipeline.create();
        p2.readFrom(Sources.mapJournal(groups, START_FROM_OLDEST))
                .withIngestionTimestamps()
                .map(x -> x.getKey() + " -> " + x.getValue()) //map group state to some output value
                .writeTo(Sinks.logger());

        jet.newJob(p2);
        jet.newJob(p1).join();
    }
}

Given the above example, does Hazelcast Jet preserve Data Locality? In the sense that the code updating the state of a group should be invoked on the same node where the state of given group is located.

Follow-up question: If the StreamSource were replaced by Kafka partitioned by the same groups, would that partitioning be preserved and correlate with Data Locality of the pipeline?

user3078523
  • 1,520
  • 18
  • 27

0 Answers0