1

I have an nginx access_log Input that receives logs in json format. I have been trying to get the JSON Extractors working but to no avail.

Firstly, I was following this official Graylog tutorial: https://www.graylog.org/videos/json-extractor

This is a sample full message that comes in:

MyHost nginx: { “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https:////www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }

It's then extracted into a json field by the use of a following regex: nginx:\s+(.*)

Then the json field looks like that:

{ “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https://www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }

However from now on things only go downhill. I have set up a basic default JSON extractor without changing any options and when I click "Try" it shows the correct output:

enter image description here

Sadly after I implement this extractor, messages stop showing up in my Input. There has to be some kind of error but I couldn't find anything in the server.log located in /var/log/graylog-server/server.log.

Hope someone will help me figure this out!

2 Answers2

0

I had same issue. Graylog has it's own timestamp field. You should try add key prefix _ to your extractor, so that your nginx timestamp would not conflict with graylog timestamp field

Art3A
  • 36
  • 1
  • 3
0

Since the link to the solution has been removed by a moderator, here's a pipeline that ultimately got the job done:

rule "parse the json log entries"
when has_field("json")
then

 let json_tree = parse_json(to_string($message.json));

 let json_fields = select_jsonpath(json_tree, { time: "$.timestamp", 
 remote_addr: "$.remote_addr", body_bytes_sent: "$.body_bytes_sent", 
 request_time: "$.request_time", response_status: "$.response_status", 
 request: "$.request", request_method: "$.request_method", host: 
 "$.host", upstream_cache_status: "$.upstream_cache_status", 
 upstream_addr: "$.upstream_addr" , http_x_forwarded_for: 
 "$.http_x_forwarded_for" , http_referrer: "$.http_referrer", 
 http_user_agent: "$.http_user_agent", http_version: "$.http_version", 
 nginx_access: "$.nginx_access"});

 # Adding additional hours due to timezone differences, adjust it to your needs
 let s_epoch = to_string(json_fields.time);
 let s = substring(s_epoch, 0, 10);
 let ts_millis = (to_long(s) + 7200) * 1000;
 let new_date = parse_unix_milliseconds(ts_millis);

 set_field("date", new_date);



 set_field("remote_addr", to_string(json_fields.remote_addr));
 set_field("body_bytes_sent", 
 to_double(json_fields.body_bytes_sent));
 set_field("request_time", to_double(json_fields.request_time));
 set_field("response_status", 
 to_double(json_fields.response_status));
 set_field("request", to_string(json_fields.request));
 set_field("request_method", to_string(json_fields.request_method));
 set_field("host", to_string(json_fields.host));
 set_field("upstream_cache_status", 
 to_string(json_fields.upstream_cache_status));
 set_field("upstream_addr", to_string(json_fields.upstream_addr));
 set_field("http_x_forwarded_for", 
 to_string(json_fields.http_x_forwarded_for));
 set_field("http_referrer", to_string(json_fields.http_referrer));
 set_field("http_user_agent", 
 to_string(json_fields.http_user_agent));
 set_field("http_version", to_string(json_fields.http_version));
 set_field("nginx_access", to_bool(json_fields.nginx_access));

end

Note that you still have to configure an extractor, in this particular example, the original message looks a bit like this: nginx: {json}. So to make it only json, configure an extractor the following way:

enter image description here

So that's all, you may need to adjust it a bit if it doesn't work, but for most use cases it should.

Still, if anyone would be interested in seeing the entire discussion that resulted in this solution, go to this link: https://community.graylog.org/t/failed-to-index-1-messages-failed-to-parse-field-datetime-of-type-date-in-document/24960/6

  • I'm sure it's OK to repost this - this looks like a good expansion of the original post. However the normal way to handle this situation is to make the requested edit to the original answer, and then raise a "moderator intervention" flag. I expect your other post would have been undeleted upon request. – halfer Mar 02 '23 at 11:39