2

I have a Kakfa topic which includes different types of messages sent from different sources.

I would like to use the ExtractGrok processor to extract the message based on the regular expression/grok pattern.

How do I configure or run the processor with multiple regular expression?

For example, the Kafka topic contains INFO, WARNING and ERROR log entries from different applications.

I would like to separate the different log levels messages and place then into HDFS.

David Medinets
  • 5,160
  • 3
  • 29
  • 42
ilovetolearn
  • 2,006
  • 5
  • 33
  • 64

1 Answers1

3

Instead of Using ExtractGrok processor, use Partition Record processor in NiFi to partition as this processor

  1. Evaluates one or more RecordPaths against the each record in the incoming FlowFile.

  2. Each record is then grouped with other "like records".

  3. Configure/enable controller services

    RecordReader as GrokReader

    Record writer as your desired format

Then use PutHDFS processor to store the flowfile based on the loglevel attribute.

Flow:

1.ConsumeKafka processor
2.Partition Record
3.PutHDFS processor

Refer to this link describes all the steps how to configure PartitionRecord processor.

Refer to this link describes how to store partitions dynamically in HDFS directories using PutHDFS processor.

David Medinets
  • 5,160
  • 3
  • 29
  • 42
notNull
  • 30,258
  • 4
  • 35
  • 50
  • I can only enter 1 grok expression. How do I enter multiple grok expressions? – ilovetolearn Jun 11 '18 at 13:05
  • As you can keep 1 grok expression to match the LogLevel then in PartitionRecord add new property as matches to your level record path.. Then PartitionRecord processor will keep all the like records into respected flowfiles. – notNull Jun 11 '18 at 22:42
  • if the record does not match the first partition record, I should chain it to the second partition record to allow the flow file to pass through each from expression specified in the processor? – ilovetolearn Jun 11 '18 at 23:57
  • PartitionRecord does **dynamic partitions** on the flowfile based on the record path provided. **No attribute** will be added if there is no record path in the message. You can use **RouteOnAttribute** processor to filter out (or) **UpdateAttribute** processor to add attribute with some default value to it. – notNull Jun 12 '18 at 01:17
  • how do I extract specific fields from the output? – ilovetolearn Jun 12 '18 at 06:14
  • You can use **ExtractText(add new property with matching regex to the specific field)/EvaluateJson(if the input is in Json format) processor/s** to extract specific fields. If you are thinking to Query on specific fields in the flowfile content then use QueryRecord processor. Reference: https://community.hortonworks.com/articles/121794/running-sql-on-flowfiles-using-queryrecord-process.html – notNull Jun 12 '18 at 12:43