1

I have items in a Hazelcast IMap in OBJECT format, and I'm using a Jet aggregation operation with that IMap as a pipeline source. I was hoping, because of the OBJECT format, to avoid any serialisation/deserialisation of the items in my IMap during processing, the same way native Hazelcast entry processing and queries work. However, I can see that my items are in fact being serialised and then deserialised before being passed to my aggregator.

Is it possible to avoid the serialisation/deserialisation step when using Jet in this way? If so, how?

1 Answers1

3

Yes, local map reader will always serialize/deserialize the entries. The only way I can think of to work around this is to use a custom source which uses map.localKeySet() and then use mapUsingIMap to do a join on those keys. The source would look like below:

SourceBuilder.batch("localKeys", c -> c.jetInstance().getMap("map"))
 .fillBufferFn((map, buf) -> {
    for (Object key : map.localKeySet(predicate)) {
        buf.add(key);
    }
    buf.close();
 }).distributed(1).build());
Can Gencer
  • 8,822
  • 5
  • 33
  • 52
  • Thanks, but I think `mapUsingIMap` ends up doing a serialise/deserialise so isn't actually any different to just using `Sources.map`. Assuming I'm right about this, I guess this means serialisation/deserialisation is unavoidable with Jet? – Chris Wiggins Aug 18 '20 at 10:41
  • I believe it should not deserialize the value. It uses `IMap.getAsync` underneath. – Can Gencer Aug 18 '20 at 12:28
  • I may be doing something wrong, but if I include `mapUsingIMap`, I see my items being serialised/deserialised. If I replace that with a simple `map` that returns a dummy value, I don't see any serialisation/deserialisation. It appears to come from `com.hazelcast.map.impl.operation.GetOperation.runInternal`, which has a comment "in case of a local call, we do make a copy" – Chris Wiggins Aug 18 '20 at 13:12
  • Yes you are right, it's my confusion. We have a customer who actually patched Hazelcast to prevent this deserialization, however this has not made into a production release yet. It requires data to be immutable to be work, however. – Can Gencer Aug 19 '20 at 16:21
  • 1
    Yes, it obviously makes things more fragile by requiring the user to not screw up and modify the data. But I think it would be a very useful option to have, because if deserialising an item takes a similar amount of time to processing an item then this has a very serious impact on performance. – Chris Wiggins Aug 20 '20 at 06:11