1

I wrote a basic custom processor, which sends flow to "Retry" relation and also calling penalize.

package nlsn.processors.core.main;

import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import org.apache.nifi.annotation.behavior.ReadsAttribute;
import org.apache.nifi.annotation.behavior.ReadsAttributes;
import org.apache.nifi.annotation.behavior.WritesAttribute;
import org.apache.nifi.annotation.behavior.WritesAttributes;
import org.apache.nifi.annotation.documentation.CapabilityDescription;
import org.apache.nifi.annotation.documentation.SeeAlso;
import org.apache.nifi.annotation.documentation.Tags;
import org.apache.nifi.annotation.lifecycle.OnScheduled;
import org.apache.nifi.components.PropertyDescriptor;
import org.apache.nifi.flowfile.FlowFile;
import org.apache.nifi.logging.ComponentLog;
import org.apache.nifi.processor.AbstractProcessor;
import org.apache.nifi.processor.ProcessContext;
import org.apache.nifi.processor.ProcessSession;
import org.apache.nifi.processor.ProcessorInitializationContext;
import org.apache.nifi.processor.Relationship;
import org.apache.nifi.processor.exception.ProcessException;

@Tags({ "wait", "wait on time"})
@CapabilityDescription("Wait on time")
@SeeAlso({})
@ReadsAttributes({ @ReadsAttribute(attribute = "", description = "") })
@WritesAttributes({ @WritesAttribute(attribute = "", description = "") })
public class CustomWait extends AbstractProcessor {

    public static final Relationship SUCCESS_RELATIONSHIP = new Relationship.Builder()
            .name("SUCCESS").description("well done, carry on").build();

    public static final Relationship FAILURE_RELATIONSHIP = new Relationship.Builder()
            .name("FAILURE.").description("fail").build();

    public static final Relationship POINT_TO_SELF_RELATIONSHIP = new Relationship.Builder()
            .name("RETRY").description("point it back to processor").build();

    private List<PropertyDescriptor> descriptors;

    private Set<Relationship> relationships;


    @Override
    protected void init(final ProcessorInitializationContext context) {

        final Set<Relationship> relationships = new HashSet<Relationship>();
        relationships.add(SUCCESS_RELATIONSHIP);
        relationships.add(FAILURE_RELATIONSHIP);
        relationships.add(POINT_TO_SELF_RELATIONSHIP);
        this.relationships = Collections.unmodifiableSet(relationships);
    }

    @Override
    public Set<Relationship> getRelationships() {
        return this.relationships;
    }

    @Override
    public final List<PropertyDescriptor> getSupportedPropertyDescriptors() {
        return descriptors;
    }

    @OnScheduled
    public void onScheduled(final ProcessContext context) {

    }

    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
        final ComponentLog logger = getLogger();
        FlowFile flowFile = session.get();
        if (flowFile != null) {
            logger.info("flow file is not null.");
            String state = flowFile.getAttribute("_wait_state");
            if (state == null || state.isEmpty()) {
                logger.info("\"_wait_state\" attribute is missing, going into WAIT.");
                flowFile = session.putAttribute( flowFile, "_wait_state", "1");
                flowFile = session.penalize(flowFile);
                session.transfer( flowFile, POINT_TO_SELF_RELATIONSHIP );
            } else {
                logger.info("\"_wait_state\" attribute is available, breaking WAIT.");
                flowFile = session.removeAttribute( flowFile, "_wait_state" );
                session.transfer( flowFile, SUCCESS_RELATIONSHIP); 
            }
        } else {
            //logger.info("flow file is null (bad)!!!.");
        }
    }
}

enter image description here

code is working as expected. But I am wondering why task count (192,569) is so high. As expected, process finished in 30 sec?

(see CustomWait processor task count)

  1. what is nifi running in background?
  2. does this large count actually hogs the CPU?
  3. if this is bad, how to fix it?

Thanks

Rakesh Prasad
  • 602
  • 1
  • 13
  • 32

1 Answers1

3
  1. A processor is scheduled to run by the NiFi controller when there is a FlowFile (FF) in the queue feeding the process without checking the penalized state of the FFs. In the onTrigger of the processor, it will attempt to get FFs from the input queues (session.get()). This session.get() will not get any penalized FFs, so it will end up returning null. This is why the check for a null FF is needed and not bad. I'm assuming you didn't change the run schedule, which means the controller is going to attempt to run that processor as fast as possible. This leads to the inflated task count.
  2. It is attempting to check for input to process so it is using CPU. Whether that hogs depends on the number of tasks available and processors running on the system.
  3. Not inherently bad but can be cut down by setting a run schedule != 0.
JDP10101
  • 1,852
  • 13
  • 20
  • Thanks. For this simple processor, will change run schedule = (Penalty Duration / 2). Just to reduce load on CPU. Wondering if thr is an annotation for onTrigger function, which can stop the function call itself, if thr r no non penalize FF. – Rakesh Prasad Aug 19 '18 at 03:49
  • You can use {context.yield()} in order to manually "pause" the controller from scheduling the processor (think of it like penalizing the whole processor). Typically this is used when a processor hits an exception that would persist through processing of other FFs (like a remote service being down). You can use this in the body of the {FF == null} but you may accidentally hit it if you run the processor with more than one task and a fast run duration. – JDP10101 Aug 19 '18 at 04:52
  • thanks for yield suggestion. make sense to use it, but what breaks the processor wait/sleep? I meant, if i put `if (ff==null) {context.yield();}` and processor is no more in queue for a active thread. then which event brings it back to life? if i do not penalize the file, and go with yield solution, FF is still in (somewhere), so how will nifi knows when to bring back processor in service. – Rakesh Prasad Aug 19 '18 at 13:37
  • In the same location you can configure the penalize duration (in the configuration of the processor), you can set the yield duration. The default is 1 second. Link to the docs: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#settings-tab – JDP10101 Aug 19 '18 at 15:45
  • 2
    I think Joe's explanation is correct, the framework normally wouldn't execute a processor at all when its queues are empty, but since there is a flow file there it attempts to execute but then finds out the flow file is penalized and does nothing, I would not add a yield to your code unless you really want to slow down the whole processor due to some error condition, just change the run schedule to even 1 ms or 10ms which will lighten the CPU load – Bryan Bende Aug 20 '18 at 13:25