Hazelcast Jet - fault tolerance in processing Redis streams

Question

I have a 3 node Hazelcast Jet cluster deployed with few jobs that reads and write to Redis Streams.

As of now the streams from Redis can either be read from a specific position or from the beggining or the newer ones. If a job got restarted for some reason (bug) it should have to start where it left.

Also, how do I code or configure my jobs in a way it can reattempt to process failed messages, say some runtime exception happend during processing, disconnected network to other webservice, or to Redis cluster.

score 1 · Answer 1 · edited Nov 15 '19 at 08:51

You can configure Jet jobs for fault tolerance via JobConfig

JobConfig config = new JobConfig();
config.setProcessingGuarantee(EXACTLY_ONCE)
    .setSnapshotIntervalMillis(3000);
Job job = instance.newJob(p, config);

This configures the job so that it takes snapshots every 3 seconds. If your job is restarted, this can happen due to down-scaling/up-scaling of Jet cluster or a manual call job.restart(), the source will continue from the last saved snapshot. Some of the messages can be emitted twice, the sink should handle that duplication.

We use lettuce client for RedisSource and you can configure the timeout via RedisUri, lettuce client will try to reconnect in case of a disconnection. But any failure during processing fill fail the job.

Hazelcast Jet - fault tolerance in processing Redis streams

1 Answers1