2

I'm trying to nut out all _grokparsefailure's on my logstash box.

Seems the only two culprits are NGINX logs which trip up my NGINXACCESS pattern:

 %{IPORHOST:clientip} %{NGUSER:ident} %{NGUSER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer}) %{QS:agent}

the following are two examples of message results that get tagged as grok fails.

172.31.0.2 - - [30/Jul/2015:15:10:49 +1000] "GET /web-app/[EXPAND] HTTP/1.1" 404 6432 "-" "Amazon CloudFront" "web-app.mydomain.com" "127.0.0.1" 

172.31.0.2 - - [30/Jul/2015:14:13:52 +1000] "GET /web-app/show?wid=5540cfbc3asdf034ct=&domain=apptest.mydomain.com&ttl=\x5C%2230\x5C%22&filter_id=14026&unique_id=1 HTTP/1.1" 200 11400 "http://apptest.mydomain.com/"; "Amazon CloudFront" "apptest.mydomain.com" "127.0.0.1" 

going through the grok debugger, the fail relates to %{URIPATHPARAM:request} hitting the brackets for [EXPAND] in the first example and the backslashes for the \x5C%2230\x5C%22 in the second. ie. if i remove [, ], or \ from the inputs then grok matches fine.

I can't seem to workout how to get the URIPATHPARAM grok filter to deal with those examples of brackets and backslash. Any ideas?

autonomy
  • 23
  • 3

2 Answers2

2

Generally I would recommend to use another pattern as @Alain suggested. If you still want to solve this with a more exact pattern you can use a grok field like this:

(?<request>(?:/[A-Za-z0-9$.+!*'(){}\[\]\\,~:;=&@#?%_\-]*)+)

(This is a mix of URIPATH and URIPARAM with backslashes and brackets.)

The entire grok pattern would look like this:

%{IPORHOST:clientip} %{NGUSER:ident} %{NGUSER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} (?<request>(?:/[A-Za-z0-9$.+!*'(){}\[\]\\,~:;=&@#?%_\-]*)+) HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer}) %{QS:agent}

This works for both of your given examples.

However, there is another issue with your inputs. The second one has a semicolon behind its referrer ("http://apptest.mydomain.com/";) which the first one has not. You'll have to take care of that.

So you need the pattern to have an optional semicolon (?:;|):

%{IPORHOST:clientip} %{NGUSER:ident} %{NGUSER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:"(?:%{URI:referrer}|-)"|%{QS:referrer})(?:;|) %{QS:agent}
hurb
  • 2,177
  • 3
  • 18
  • 32
1

As you've discovered, URIPATH doesn't allow for brackets. Since you have/want brackets, you'll need to use something else. How about %{NOTSPACE}?

Alain Collins
  • 16,268
  • 2
  • 32
  • 55
  • Correct answer as always Alain ;) I just liked to puzzle over that regex a little bit. – hurb Jul 29 '15 at 17:10