Nifi - Extracting Key Value pairs into new fields

Question

With Nifi I am trying to use the ReplaceText processor to extract key value pairs. The relevant part of the JSON file is the 'RuleName':

"winlog": {
    "channel": "Microsoft-Windows-Sysmon/Operational",
    "event_id": 3,
    "api": "wineventlog",
    "process": {
      "pid": 1640,
      "thread": {
        "id": 4452
      }
    },
    "version": 5,
    "record_id": 521564887,
    "computer_name": "SERVER001",
    "event_data": {
      "RuleName": "Technique=Commonly Used Port,Tactic=Command and Control,MitreRef=1043"
    },
    "provider_guid": "{5790385F-C22A-43E0-BF4C-06F5698FFBD9}",
    "opcode": "Info",
    "provider_name": "Microsoft-Windows-Sysmon",
    "task": "Network connection detected (rule: NetworkConnect)",
    "user": {
      "identifier": "S-1-5-18",
      "name": "SYSTEM",
      "domain": "NT AUTHORITY",
      "type": "Well Known Group"
    }
  },

Within the ReplaceText processor I have this configuration ReplaceText

"winlog.event_data.RuleName":"MitreRef=(.*),Technique=(.*),Tactic=(.*),Alert=(.*)"
"MitreRef":"$1","Technique":"$2","Tactic":"$3","Alert":"$4"

The first problem is that the new fields MitreRef etc. are not created. The second thing is that the fields may appear in any order in the original JSON, e.g. "RuleName": "Technique=Commonly Used Port,Tactic=Command and Control,MitreRef=1043" or, MitreRef=1043,Tactic=Command and Control,Technique=Commonly Used Port

Any ideas on how to proceed?

Do you have many of those `RuleNameentrirs` in your file? And it's not clear what do you want as a result - replace the whole json with those 3 key pairs? — daggett, Nov 17 '19 at 05:39

Endzeit · Answer 1 · 2019-11-18T17:53:22.803

Welcome to StackOverflow!

As your question is quite ambiqious I'll try to guess what you aimed for.

Replacing string value of "RuleName" with JSON representation

I assume that you want to replace the entry

"RuleName": "Technique=Commonly Used Port,Tactic=Command and Control,MitreRef=1043"

with something along the lines of

"RuleName": { 
    "Technique": "Commonly Used Port",
    "Tactic": "Command and Control",
    "MitreRef": "1043"
}

In this case you can grab basically the whole line and assume you have three groups of characters, each consisting of

A number of characters that are not the equals sign: ([^=]+)
The equals sign =
A number of characters that are not the comma sign: ([^,]+)

These groups in turn are separated by a comma: ,

Based on these assumptions you can write the following RegEx inside the Search Value property of the ReplaceText processor:

"RuleName"\s*:\s*"([^=]+)=([^,]+),([^=]+)=([^,]+),([^=]+)=([^,]+)"

With this, you grab the whole line and build a group for every important data point. Based on the groups you may set the Replacement Value to:

"RuleName": {
    "${'$1'}": "${'$2'}",
    "${'$3'}": "${'$4'}",
    "${'$5'}": "${'$6'}" 
}

Resulting in the above mentioned JSON object.

Some remarks

The RegEx assumes that the entry is on a single line and does NOT work when it is splitted onto multiple lines, e.g.
```
"RuleName":  
"Technique=Commonly Used Port,Tactic=Command and Control,MitreRef=1043"
```
The RegEx assumes the are exactly three "items" inside the value of RuleName and does NOT work with different number of "items".
In case your JSON file can grow larger you may try to avoid using the Entire text evaluation mode, as this loads the content into a buffer and routes the FlowFile to the failure output in case the file is to large. In that case I recommend you to use the Line-by-Line mode as seen in the attached image.

Allowing a fourth additional value

In case there might be a fourth additional value, you may adjust the RegEx in the Search Value property. You can add (,([^=]+)=([^,]+))? to the previous expression, which roughly translated to:

( )? - match what is in the bracket zero or one times
, - match the character comma
([^=]+)=([^,]+) - followed by the group of characters as explaind above

The whole RegEx will look like this:

"RuleName"\s*:\s*"([^=]+)=([^,]+),([^=]+)=([^,]+),([^=]+)=([^,]+)(,([^=]+)=([^,]+))?"

To allow the new value to be used you have to adjust the replacement value as well. You can use the Expression Language available in most NiFi processor properties to decide whether to add another item to the JSON object or not.

${'$7':isEmpty():ifElse(
        '',
        ${literal(', "'):append(${'$8'}):append('": '):append('"'):append(${'$9'}):append('"')}
)}

This expression will look if the seventh RegEx group exists or not and either append an empty string or the found values.

With this modification included the whole replacement value will look like the following:

"RuleName": {
    "${'$1'}": "${'$2'}",
    "${'$3'}": "${'$4'}",
    "${'$5'}": "${'$6'}"
    ${'$7':isEmpty():ifElse(
        '',
        ${literal(', "'):append(${'$8'}):append('": '):append('"'):append(${'$9'}):append('"')}
    )}
}

regarding multiple occurrences

The ReplaceText processor replaces all occurrences it finds where the RegEx matches. Using the settings provided in the last paragraph given the following example input

{
    "event_data": {
      "RuleName": "Technique=Commonly Used Port,Tactic=Command and Control,MitreRef=1043,Foo=Bar"
    },
    "RuleName": "Technique=Commonly Used Port,Tactic=Command and Control,MitreRef=1043"
}

will result in the following:

{
    "event_data": {
        "RuleName": {
            "Technique": "Commonly Used Port",
            "Tactic": "Command and Control",
            "MitreRef": "1043",
            "Foo": "Bar"
        }
    },
    "RuleName": {
        "Technique": "Commonly Used Port",
        "Tactic": "Command and Control",
        "MitreRef": "1043"
    }
}

example template

You may download a template I created that includes the above processor from gist.

Thank you for your detailed and very useful response. I was not completely clear with the question but you guessed correctly the intent. Normally there will be 3 or 4 of these Key value pairs, so I assume that the best way to handle this would be to chain two of the suggested processors together, the first for 3 values and the second for 4 in the event of failure? — user640887, Nov 18 '19 at 02:48
@user640887 I just added a new paragraph to allow for the fourth optional value. In case it is not present only the three values are mapped; as in the original answer. — Endzeit, Nov 18 '19 at 17:47
@user640887 If you got questions left feel free to ask them. Please accept the answer, in case it helped you. — Endzeit, Nov 18 '19 at 17:48

Nifi - Extracting Key Value pairs into new fields

1 Answers1

Replacing string value of "RuleName" with JSON representation

Allowing a fourth additional value

regarding multiple occurrences

example template