I'm tuning a Nifi flow that gathers metrics on other InvokeHTTP processors elsewhere within the same Nifi instance. The trouble I'm running into is the volume of the data that is pulled back is massive (576 KB/processor). I'd like to hit Nifi APIs processors endpoint with a more tailored API call to pull back a single data field instead of the full 576KB diagnostics printout.
The API call I'm using is:
http://{hostname}.compute.internal:8800/nifi-api/processors/{processorID}/diagnostics
I'm making the API call using:
- generate flowfile processor containing a list of URLs
- splitJSON proc to separate URLs out into separate flowfiles
- EvaluateJSON proc to place URLs into flowfile attributes
- Invoke HTTP to hit Nifi's processors endpoint using above structure
This flow works fine, but introduces a significant amount of raw data into the file flow (and the content repo) than I'd like, especially considering there is just one metric I'd like to collect from each proc (component.processorStatus.aggregateSnapshot.bytesOut). I've tried a few methods to include SQL query in message body, but the result is always the same, the endpoint returns the full 576 KB of diagnostics per proc.
Has anyone attempted to hit this endpoint with a tailored GET call that brings back just one field of data?
I've also been able to achieve the desired effect using a combo of ExecuteStreamCommand invoking a bash script containing a curl call piped into jq to filter the result to the one field, but this proved non-performant (takes several seconds to run multiplied by dozens of API calls every few minutes), so I'd like to avoid that approach if at all possible.