0

I have logs which looks like this:

system,info,account user admin logged out from 192.168.1.9 via local
system,info log rule added by admin

Every line begins with comma-separated list of topics and after first space list ends. There can be one, two, three or more topics in list. I need to get topics as group of values, like it is [ "system", "info", "account" ] for first line and [ "system", "info" ] for second.

I was trying to extract list first with use ^\S+ and then [^,]+ on first regex result. It works OK but maybe there is the way which allows to do that with one regex?

I want one-line regex because I'm going to use that regex in Grok pattern to add these topics as tags. Grok uses Oniguruma regex engine.

  • It will require two steps in fact: 1) grabbing the comma-delimited part and 2) splitting with comma to get the list. – Wiktor Stribiżew Jun 06 '23 at 10:43
  • 1
    It's not easy in one go :-( But if Onigurua's engine works like ECMAScript or .NET, then you could use a positive lookbehind using a non-fix length pattern (not working with PCRE or most other engines) like this: `(?<=^(?:\S+,)*)\w+` https://regex101.com/r/ZhYqaC/1 – Patrick Janser Jun 06 '23 at 12:18
  • @PatrickJanser thank you so much but grok in logstash says me "invalid pattern in look-behind" and I'm sorry but your regex doesn't work in my case. – Constantin Dolinin Jun 07 '23 at 13:29
  • @ConstantinDolinin Well, I supposed you would only have 30% chances it would work with this regex engine. Unfortunately, I haven’t got another idea. If Wiktor says you can’t in one go, you can believe him! He‘s a specialist in regex ;-) – Patrick Janser Jun 07 '23 at 17:34
  • If I read the Oniguruma syntax specs correctly, something like `\G,?(\w+)` should work to capture only the comma-separated words at the beginning of the string. – oriberu Jun 08 '23 at 09:24

1 Answers1

1

The solution was to use mutate { split ... merge ... } after groking topics part from the message.

filter {
    grok {
        patterns_dir => [ "/etc/logstash/patterns" ]
        match => { "message" => "(?<mttopics>^\S+) %{GREEDYDATA:message}" }
        overwrite => [ "message" ]
    }
    mutate {
        split => { "mttopics" => "," }
        merge => { "tags" => "mttopics" }
        remove_field => [ "mttopics" ]
    }
}