I've just learned about SnappyData (and watched some videos about it), and it looks interesting mainly when says that the performance might be many times faster than a regular spark job.
Could the following code (snippet) leverage on the SnappyData capabilities to improve the performance of a job and provide the same behavior ?
Dataset<EventData> ds = spark
.readStream()
.format("kafka")
(...)
.as(Encoders.bean(EventData.class));
KeyValueGroupedDataset<String, EventData> kvDataset = ds.groupByKey(new MapFunction<EventData, String>() {
public String call(EventData value) throws Exception {
return value.getId();
}
}, Encoders.STRING());
Dataset<EventData> processedDataset = kvDataset.mapGroupsWithState(new MapGroupsWithStateFunction<String, EventData, EventData, EventData>(){
public EventData call(String key, Iterator<EventData> values, GroupState<EventData> state) throws Exception {
/* state control code */
EventData processed = EventHandler.validate(key,values);
return processed;
}}, Encoders.bean(EventData.class), Encoders.bean(EventData.class));
StreamingQuery query = processedDataset.writeStream()
.outputMode("update")
.format("console")
.start();