0

I have a file with the following format:

10302\t<document>.....</document>   
12303\t<document>.....</document>   
10054\t<document>.....</document>   
10034\t<document>.....</document>   

as you can see there are two values separated by a tab char. I need to

  • index the first token (e.g. 10302, 12303...) as ID
  • extract (and then index) some information from the second token (the XML document). In other words, the second token would be used with the xml filter for extracting some information

Is it possibile to do that separating the two values using the kv filter? Ideally I should end, for each line, with a document like this:

id:10302       
msg:<document>....</document>

I could use a grok filter but I'd like to avoid any regex as the field detection is very easy and can be accomplished with a simple key-value logic. However, using a plain kv detection I'm ending with the following:

"10302": <document>.....</document>   
"12303": <document>.....</document>   
"10054": <document>.....</document>   
"10034": <document>.....</document>  

and this is not want I need.

Andrea
  • 2,714
  • 3
  • 27
  • 38
  • Could you add the configuration of your kv filter? – baudsp Nov 02 '16 at 16:19
  • I do not have it because I don t know how to say "take the key and create and attribute id with that key as value, then take the value and create an attribute message with that value" – Andrea Nov 02 '16 at 17:50
  • 1
    Ok. I don't think it's possible to use kv for the job you want to do, since there are no possible key for the id (10302, 10303, 10304...). But grok would be perfectly workable with `%{INT:ID}\t%{GREEDYDATA:msg}` – baudsp Nov 03 '16 at 08:49
  • Many thanks, I think I came to the same conclusion. If you put your comment in an answer I will accept it. – Andrea Nov 03 '16 at 10:10
  • You're welcome. I added an answer, with an additional anchor (`^`) in the regex for better performance (in theory) – baudsp Nov 03 '16 at 10:23

1 Answers1

0

It is not possible to use kv for the job you want to do, as far as I know, since there are no possible key for the id (10302, 10303, 10304...). There are no possible key since there is nothing before the id.

This grok configuration would work, assuming each id + document is on the same line :

grok {
  match => { "message" => "^%{INT:ID}\t%{GREEDYDATA:msg}"}
}
baudsp
  • 4,076
  • 1
  • 17
  • 35