0

I'm trying to create a custom wrapper to log the execution time spent to process each message performed by each vertex processor.

Using the DiagnosticProcessors.peekInputP/peekOutputP and processor.PeekWrappedP as inspiration, I ended up with the following codes:

public final class LogEngine {

    private LogEngine() {
    }

    @Nonnull
    public static <T> ProcessorMetaSupplier logP(@Nonnull ProcessorMetaSupplier wrapped) {
        return new WrappingProcessorMetaSupplier(wrapped, p ->
                new LogWrappedP<>(p));
    }

    @Nonnull
    public static <T> ProcessorSupplier logP(@Nonnull ProcessorSupplier wrapped) {
        return new WrappingProcessorSupplier(wrapped, p ->
                new LogWrappedP<>(p));
    }

    @Nonnull
    public static <T> DistributedSupplier<Processor> logP(@Nonnull DistributedSupplier<Processor> wrapped) {
        return () -> new LogWrappedP<>(wrapped.get());
    }    

}

And

public final class LogWrappedP<T> implements Processor {

    private final Processor wrappedProcessor;
    private static Logger logger;

    private ProcCtx ctx;

    public LogWrappedP(@Nonnull Processor wrappedProcessor) {
        checkNotNull(wrappedProcessor, "wrappedProcessor");

        this.wrappedProcessor = wrappedProcessor;
    }

    @Override
    public void init(@Nonnull Outbox outbox, @Nonnull Context context) {

        // Fix issue #595: pass a logger with real class name to processor
        // We do this only if context is ProcCtx (that is, not for tests where TestProcessorContext can be used
        // and also other objects could be mocked or null, such as jetInstance())
        if (context instanceof ProcCtx) {
            ProcCtx c = (ProcCtx) context;
            NodeEngine nodeEngine = ((HazelcastInstanceImpl) c.jetInstance().getHazelcastInstance()).node.nodeEngine;
            ILogger newLogger = nodeEngine.getLogger(
                    createLoggerName(wrappedProcessor.getClass().getName(), c.vertexName(), c.globalProcessorIndex()));
            ctx = new ProcCtx(c.jetInstance(), c.getSerializationService(), newLogger, c.vertexName(),
                    c.globalProcessorIndex(), c.processingGuarantee());
        }

        logger = LogManager.getLogger(wrappedProcessor.getClass().getName());

        wrappedProcessor.init(outbox, ctx);
    }

    @Override
    public boolean isCooperative() {
        return wrappedProcessor.isCooperative();
    }

    @Override
    public void process(int ordinal, @Nonnull Inbox inbox) {
        KafkaMessage msg = (KafkaMessage) inbox.peek();

        logger.info("START {} {} {} {} {}", ctx.vertexName(), getProcessorId(), getInstanceHost(), msg.getUuid(), Instant.now().toEpochMilli());

        wrappedProcessor.process(ordinal, inbox);

        logger.info("END {} {} {} {} {}", ctx.vertexName(), getProcessorId(), getInstanceHost(), msg.getUuid(), Instant.now().toEpochMilli());
    }

    @Override
    public boolean tryProcess() {
        return wrappedProcessor.tryProcess();
    }

    @Override
    public boolean complete() {
        return wrappedProcessor.complete();
    }

    @Override
    public boolean completeEdge(int ordinal) {
        return wrappedProcessor.completeEdge(ordinal);
    }

    @Override
    public boolean saveToSnapshot() {
        return wrappedProcessor.saveToSnapshot();
    }

    @Override
    public void restoreFromSnapshot(@Nonnull Inbox inbox) {
        wrappedProcessor.restoreFromSnapshot(inbox);
    }

    @Override
    public boolean finishSnapshotRestore() {
        return wrappedProcessor.finishSnapshotRestore();
    }

    protected int getProcessorId() {
        return ctx.globalProcessorIndex();
    }

    protected String getInstanceUUID() {
        return ctx.jetInstance().getCluster().getLocalMember().getUuid();
    }

    protected String getInstanceHost() {
        return ctx.jetInstance().getCluster().getLocalMember().getAddress().getHost();
    }    

}

Now I am able to use my wrapper in my app vertex:

Vertex kafkaSource = dag.newVertex("kafkaSource", streamKafkaP(properties, decodeKafkaMessage, topic))
        .localParallelism(2);

Vertex app = dag.newVertex("app", LogEngine.logP(ProcessFrameP::new))
        .localParallelism(2);

And get the expected results,

2018-02-18 08:47:04,024 INFO START app 1 172.21.0.1 bc407e15-e78e-4734-822d-1172485e6632 1518954424024
2018-02-18 08:47:04,108 INFO END app 1 172.21.0.1 bc407e15-e78e-4734-822d-1172485e6632 1518954424108
2018-02-18 08:47:04,681 INFO START app 1 172.21.0.1 82e38e7e-73b7-4729-8d28-4f7fc87700ad 1518954424681
2018-02-18 08:47:04,710 INFO END app 1 172.21.0.1 82e38e7e-73b7-4729-8d28-4f7fc87700ad 1518954424710
2018-02-18 08:47:05,524 INFO START app 1 172.21.0.1 16633f77-8af5-4ab1-b94a-6192022f904f 1518954425524
2018-02-18 08:47:05,551 INFO END app 1 172.21.0.1 16633f77-8af5-4ab1-b94a-6192022f904f 1518954425551
2018-02-18 08:47:06,518 INFO START app 1 172.21.0.1 29622922-4987-44d4-8def-186b415c8fa9 1518954426518
2018-02-18 08:47:06,533 INFO END app 1 172.21.0.1 29622922-4987-44d4-8def-186b415c8fa9 1518954426533
2018-02-18 08:47:07,457 INFO START app 1 172.21.0.1 ce016601-d7be-4382-bc81-1d6a75e8748b 1518954427457
2018-02-18 08:47:07,475 INFO END app 1 172.21.0.1 ce016601-d7be-4382-bc81-1d6a75e8748b 1518954427475
2018-02-18 08:47:08,358 INFO START app 1 172.21.0.1 6a0be934-3eb6-4e46-9f08-76c072304de6 1518954428358
2018-02-18 08:47:08,379 INFO END app 1 172.21.0.1 6a0be934-3eb6-4e46-9f08-76c072304de6 1518954428379

The problem is that the wrapper is not working for the kafkaSource vertex. I tried to understand the logic behind the peekOutputP, but I couldn't get a working version for the source vertex.

What should I do to have the wrapper also working in a source vertex ?

Oliv
  • 10,221
  • 3
  • 55
  • 76
Kleyson Rios
  • 2,597
  • 5
  • 40
  • 65

1 Answers1

0

Kafka source doesn't process items - it emits them. That's why the process method isn't called.

However, your process method is also incorrect. The process method is not required to process exactly one item from the inbox: it can process many or also zero items. You could use code similar to LoggingInbox, but you will only know that an item was taken, but you don't know when the processing started and finished.

Feasible method might be by wrapping AbstractProcessor, which is base class for most Jet's processors. It makes the process method final and only allows its subclasses to override the tryProcess(item) methods. This method processes only one item: but it can return false if it is not finished processing it and wants to be called again with the same item.

Here's the code:

/**
 * A processor that wraps another AbstractProcessor and measures 
 * time spent to process each item.
 */
public class MeasureTimeP extends AbstractProcessor {

    private final AbstractProcessor wrapped;
    private long spentTime;
    private TestInbox tmpInbox = new TestInbox();

    private MeasureTimeP(Processor wrapped) {
        this.wrapped = (AbstractProcessor) wrapped;
    }

    @Override
    protected boolean tryProcess(int ordinal, @Nonnull Object item) {
        // if tmpInbox is not empty, we are continuing with an item that was processed before
        if (tmpInbox.isEmpty()) {
            tmpInbox.add(item);
        } else {
            assert tmpInbox.peek().equals(item);
        }
        spentTime -= System.nanoTime();
        // process the item
        wrapped.process(ordinal, tmpInbox);
        spentTime += System.nanoTime();
        // if tmpInbox is now empty, the item was processed
        if (tmpInbox.isEmpty()) {
            getLogger().info("Processed item in " + NANOSECONDS.toMillis(spentTime) + " ms: " + item);
            spentTime = 0;
            return true;
        }
        return false;
    }

    @Nonnull
    public static ProcessorMetaSupplier measureTimeP(@Nonnull ProcessorMetaSupplier wrapped) {
        return new WrappingProcessorMetaSupplier(wrapped, MeasureTimeP::new);
    }

    @Nonnull
    public static ProcessorSupplier measureTimeP(@Nonnull ProcessorSupplier wrapped) {
        return new WrappingProcessorSupplier(wrapped, MeasureTimeP::new);
    }

    @Nonnull
    public static DistributedSupplier<Processor> measureTimeP(@Nonnull DistributedSupplier<Processor> wrapped) {
        return () -> new MeasureTimeP(wrapped.get());
    }

    // ######
    // Here is the rest of AbstractProcessor's methods where we'll just delegate
    // to the wrapped processor. You can add time measurement to those too.

    @Override
    public boolean tryProcessWatermark(@Nonnull Watermark watermark) {
        return wrapped.tryProcessWatermark(watermark);
    }

    @Override
    public boolean isCooperative() {
        return wrapped.isCooperative();
    }

    @Override
    protected void init(@Nonnull Context context) throws Exception {
        wrapped.init(getOutbox(), context);
    }

    @Override
    protected void restoreFromSnapshot(@Nonnull Object key, @Nonnull Object value) {
        assert tmpInbox.isEmpty();
        tmpInbox.add(entry(key, value));
        wrapped.restoreFromSnapshot(tmpInbox);
        assert tmpInbox.isEmpty();
    }

    @Override
    public boolean tryProcess() {
        return wrapped.tryProcess();
    }

    @Override
    public boolean completeEdge(int ordinal) {
        return wrapped.completeEdge(ordinal);
    }

    @Override
    public boolean complete() {
        return wrapped.complete();
    }

    @Override
    public boolean saveToSnapshot() {
        return wrapped.saveToSnapshot();
    }

    @Override
    public boolean finishSnapshotRestore() {
        return wrapped.finishSnapshotRestore();
    }
}

And the test:

public static void main(String[] args) {
    JetInstance instance = Jet.newJetInstance();

    IList<Integer> list = instance.getList("list");
    for (int i = 0; i < 50; i++) {
        list.add(i);
    }

    // we'll use this mapper to simulate slow operation. It will sleep according to the item value.
    DistributedFunction<Integer, Integer> mapFn = processTime -> {
        LockSupport.parkNanos(MILLISECONDS.toNanos(processTime));
        return processTime;
    };

    DAG dag = new DAG();
    Vertex source = dag.newVertex("source", readListP("list"));
    Vertex process = dag.newVertex("process", measureTimeP(mapP(mapFn)));
    dag.edge(between(source, process));

    instance.newJob(dag).join();

    instance.shutdown();
}
Oliv
  • 10,221
  • 3
  • 55
  • 76
  • hi @Oliv, I saw the LoggingOutbox but I couldn't get a working version for my case, I will try again. For the LogWrappedP.process(), I tried to have my logger in the tryProcess() method, but the logger was being called hundred of times, even without any message in the inbox. Having the logger in the process() method I got the result the previously I was expecting by the tryProcess(). – Kleyson Rios Feb 20 '18 at 16:39
  • I missed the point in your question: you try to measure time to process an item. I replaced the answer. – Oliv Feb 21 '18 at 21:09
  • Hi @Oliv, looks like this code is not compatible with Jet 0.5.1 that I'm using. Your sample code only works with inbox items, right ? I need to log information for the KafkaSource as well, and as you explained before input source will not trigger the tryProcess(). What changes should I do to get this wrapper also logging information for the KafkaSource vertex ? – Kleyson Rios Feb 25 '18 at 09:53
  • using now 0.6-SNAPSHOT to compatibility, but don't know yet how to log information for each Source input item. – Kleyson Rios Feb 25 '18 at 11:06
  • As I said, it's not clear to me how to measure item processing time for Kafka source. It's the duration of what? If you just want to see the emitted items, use standard `peekOutputP`: `Vertex source = dag.newVertex("source", peekOutputP(streamKafkaP(...)));` – Oliv Feb 26 '18 at 06:52
  • Hi @Oliv, as we can see in my example I'm not calculating the time to process each item. I'm just logging the timestamp on the start and end of each execution, and that it's what I would like to have in the Kafka source. Those logs will be sent to an external system for many metrics calculations. And that's the reason why I would like to log the timestamp when the input source emit the item to the downstream processors. I will try to review the peekOutputP(). – Kleyson Rios Feb 26 '18 at 09:44
  • `peekOutputP` uses a logger, it logs the emission time if configured. You log the start & end incorrectly, you can't use `inbox.peek()` meaningfully in `process(inbox)` method. – Oliv Feb 27 '18 at 07:40