How can I make Jet processors fault tolerant

Question

I use the Hazelcast Jet core API to design new processors for my DAGs. Some of these processors might fail throwing exceptions that - if not handled somehow - will cause the entire job to fail and stop.

I'm trying therefore to design a mechanism to introduce some optional fault tolerance policies inside my processors' code. Just to give an idea, I'd like to make it possible to handle errors by configuring one of these strategies:

failing as soon as it happens (current behaviours when exceptions are unhandled)
retrying a few times before failing
executing some fallback code that will typically catch the exception, audit the error and just go on.

However, given the way the Processor interface is designed, there is no way to provide this behaviour by generically decorating processors, unless they are internally designed to support this fault tolerance policies.

In fact, I would like to be able to somehow decorate them using a method that could be like this:

Processors.faultTolerantP(ProcessorMetaSupplier supplier, FaultTolerancePolicy policy)

where FaultTolerancePolicy is a description of the above mentioned policies.

So far, the only thing I could do is to design my "fault tolerant processors" to implement the IFaultTolerant interface allowing to inject the policy into the processor. Then the processor code must "manually" handle the fault-tolerance policy.

interface IFaultTolerant{
   void setFaultTolerancePolicy(FaultTolerancePolicy policy);
}

class MyProcessor extends AbstractProcessor implements IFaultTolerant{
   public void setFaultTolerancePolicy(FaultTolerancePolicy policy){
      // stores the policy and behaves as specified by the policy when errors occur
   }
}

class MyProcessors{
    public static ProcessorMetaSupplier faultTolerantP(ProcessorMetaSupplier supplier, FaultTolerancePolicy policy) {
        return new WrappingProcessorMetaSupplier(supplier, p -> faultTolerantP(p, policy));
    }

    private static Processor faultTolerantP(Processor p, FaultTolerancePolicy policy) {
        if (p instanceof IFaultTolerant) {
            ((IFaultTolerant)p).setFaultTolerancePolicy(policy);
        }
        return p;
    }
}

Do you have any advice about this? Would it be possible to intercept faults at a higher level, so that any processor can become fault tolerant without having to be designed for that?

It sounds similar to this issue: https://github.com/hazelcast/hazelcast-jet/issues/1733 maybe you could comment there as well? — Can Gencer, May 12 '20 at 20:43

score 0 · Answer 1 · answered May 12 '20 at 20:58

I'm not sure if it's possible to handle this in a generic way outside of the processor. Retrying must be supported by the processor. For example, the processor can take an item from the inbox and proceed to handle it, but it fails when handling it. The caller of the process method will catch the exception and retry - but the item is no longer in the inbox.

But let's say you know the processor is designed to be able to retry or ignore the calls, you can do this:

First create a processor wrapper that will catch and handle the exceptions:

public static class FaultTolerantProcessorWrapper implements Processor {

    private final Processor delegate;
    private final FaultTolerancePolicy policy;

    protected FaultTolerantProcessorWrapper(FaultTolerancePolicy policy, Processor delegate) {
        this.policy = policy;
        this.delegate = delegate;
    }

    @Override
    public void process(int ordinal, @Nonnull Inbox inbox) {
        try {
            delegate.process(ordinal, inbox);
        } catch (Exception e) {
            policy.handle(e);
            if (policy.isIgnore()) {
                // will ignore the entire batch of items, not just the item that failed
                inbox.clear();
            }
        }
    }

    // repeat for other methods such as `tryProcessWatermark`, `complete` etc.
}

Then use it like this:

// if your custom processor uses a ProcessorMetaSupplier
Vertex v = dag.newVertex("v", new WrappingProcessorMetaSupplier(
        YourProcessor.metaSupplier(),
        p -> new FaultTolerantProcessorWrapper(faultTolerancePolicy, p)));

// if your custom processor uses a ProcessorSupplier
Vertex v = dag.newVertex("v", new WrappingProcessorSupplier(
        YourProcessor.supplier(),
        p -> new FaultTolerantProcessorWrapper(faultTolerancePolicy, p)));

// if your custom processor uses a SupplierEx<Processor>
Vertex v = dag.newVertex("v",
        () -> new FaultTolerantProcessorWrapper(faultTolerancePolicy, new YourProcessor()));

But I personally would not use this approach. In my opinion it's fragile. And I would not recommend "enhancing" jet's built-in processors. Batch sources will very likely produce incorrect results. And streaming processors typically already have fault tolerance built in.

I Initially tried the approach you say, but the problem is that AbstractProcessor.process method peeks the item from the Inbox and tries to process it, removing the item only after processing is successfully done. This means that - supposing that item is causing the issue - catching the exception will likely introduce a never ending error loop, because I will end up processing the same item forever. Therefore by now I made each custom processor optionally implement a fault tolerance policy, and enabled/disabled that using the decorators method Processors.faultTolerantP — Mirko Luchi, May 15 '20 at 07:18
@MirkoLuchi that was the point: correct behavior depends on the exact behavior of the processor. You can't add a generic way to do it. The Processor interface isn't suitable for that. — Oliv, May 15 '20 at 09:00

How can I make Jet processors fault tolerant

1 Answers1