6

I've got data coming from kafka and I want to send them to ElasticSearch. I've got a log like this with tags:

<TOTO><ID_APPLICATION>APPLI_A|PRF|ENV_1|00</ID_APPLICATION><TN>3</TN></TOTO>

I'm trying to parse it with grok using grok debugger:

\<ID_APPLICATION\>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}\</ID_APPLICATION\>\<TN\>%{NUMBER:TN}\</TN\>

It works, but sometimes the log has a new field like this (the one with the tag <TP>):

<TOTO><ID_APPLICATION>APPLI_A|PRF|ENV_1|00</ID_APPLICATION><TN>3</TN><TP>new</TP></TOTO>

I'd like to get lines with this field (the TP tag) and lines without. How can I do that?

baudsp
  • 4,076
  • 1
  • 17
  • 35
David
  • 61
  • 1
  • 3

2 Answers2

11

If you have an optional field, you can match it with an optional named capturing group:

(?:<TP>%{WORD:TP}</TP>)?
^^^                    ^

The non-capturing group does not save any submatches in memory and is used for grouping only, and ? quantifier matches 1 or 0 times (=optional). It will create a TP field with a value of type word. If the field is absent, the value will be null.

So, the whole pattern will look like:

<ID_APPLICATION>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}</ID_APPLICATION><TN>%{NUMBER:TN}</TN>(?:<TP>%{WORD:TP}</TP>)?
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

This is the filter I used in Heroku App and reading this Documentation on how to use grok operators.

I created my own pattern, called "content" that will retrieve whatever it is inside your TP tags.

\<ID_APPLICATION\>%{WORD:APPLICATION}\|%{WORD:PROFIL}\|%{WORD:ENV}\|%{WORD:CODE}\<\/ID_APPLICATION\>\<TN>%{NUMBER:TN}\<\/TN\>(\<TP\>(?<content>(.)*)\<\/TP\>)?

Basically, I just added an optionnal tag to your pattern.

(<TP> ... </TP>)? 

To retrieve the content, which I assume can be anything, I added the following inside the optional tags.

(?<content>(.)*)
vdolez
  • 977
  • 1
  • 14
  • 33