9

Is it possible to extract json fields that are nested inside a log?

Sample I've been work on:

thread-191555 app.main - [cid: 2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f, uid: e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970] Request: protocol=[HTTP/1.0] method=[POST] path=[/metrics] headers=[Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache] entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }

what I wanted to achieve was:

{
"extract": "text",
"duration": "451"
}

I tried to combine a sample regex ("(extract)"\s*:\s*"([^"]+)",?) with example_parser %{data::json} (using the JSON as a log sample data, for starters) but I haven't managed to get anything working.

Thanks in advance!

user3285241
  • 319
  • 2
  • 3
  • 18

1 Answers1

15

Is that sample text formatted properly? The final entity object is missing a ] from the end.

entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }

should be

entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }]

I'm going to continue these instructions assuming that was a typo and the entity field actually ends with ]. If it doesn't, I think you need to fix the underlying log to be formatted properly and close out the bracket.


Instead of just skipping the entire log and only parsing out that json bit, I decided to parse the entire thing and show what would look good as a final result. So the first thing we need to do is pull out that set of key/value pairs after the request object:

Example Input: thread-191555 app.main - [cid: 2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f, uid: e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970] Request: protocol=[HTTP/1.0] method=[POST] path=[/metrics] headers=[Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache] entity=[HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }]

Grok parser rule: app_log thread-%{integer:thread} %{notSpace:file} - \[%{data::keyvalue(": ")}\] Request: %{data:request:keyvalue("=","","[]")}

Result:

{
  "thread": 191555,
  "file": "app.main",
  "cid": "2cacd6f9-546d-41ew-a7ce-d5d41b39eb8f",
  "uid": "e6ffc3b0-2f39-44f7-85b6-1abf5f9ad970",
  "request": {
    "protocol": "HTTP/1.0",
    "method": "POST",
    "path": "/metrics",
    "headers": "Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache",
    "entity": "HttpEntity.Strict application/json {\"type\":\"text\",\"extract\": \"text\", \"field2\":\"text2\",\"duration\": 451 }"
  }
}

app log parser

Notice how we use the keyvalue parser with a quoting string of [], that allows us to easily pull out everything from the request object.


Now the goal is to pull out the details from that entity field inside of the request object. With Grok parsers you can specify a specific attribute to parse further.

So in that same pipeline we'll add another grok parser processor, right after our first

enter image description here

And then configure the advanced options section to run on request.entity, since that is what we called the attribute

enter image description here

Example Input: HttpEntity.Strict application/json {"type":"text","extract": "text", "field2":"text2","duration": 451 }

Grok Parser Rule: entity_rule %{notSpace:request.entity.class} %{notSpace:request.entity.media_type} %{data:request.entity.json:json}

Result:

{
  "request": {
    "entity": {
      "class": "HttpEntity.Strict",
      "media_type": "application/json",
      "json": {
        "duration": 451,
        "extract": "text",
        "type": "text",
        "field2": "text2"
      }
    }
  }
}

Now when we look at the final parsed log it has everything we need broken out:

enter image description here


Also just because it was really simple, I threw in a third grok processor for the headers chunk as well (the advanced settings are set to parse from request.headers):

Example Input: Timeout-Access: <function1>, Remote-Address: 192.168.0.1:37936, Host: app:5000, Connection: close, X-Real-Ip: 192.168.1.1, X-Forwarded-For: 192.168.1.1, Authorization: ***, Accept: application/json, text/plain, */*, Referer: https://google.com, Accept-Language: cs-CZ, Accept-Encoding: gzip, deflate, User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko, Cache-Control: no-cache

Grok Parser Rule: headers_rule %{data:request.headers:keyvalue(": ", "/)(; :")}

Result:

{
  "request": {
    "headers": {
      "Timeout-Access": "function1",
      "Remote-Address": "192.168.0.1:37936",
      "Host": "app:5000",
      "Connection": "close",
      "X-Real-Ip": "192.168.1.1",
      "X-Forwarded-For": "192.168.1.1",
      "Accept": "application/json",
      "Referer": "https://google.com",
      "Accept-Language": "cs-CZ",
      "Accept-Encoding": "gzip",
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko",
      "Cache-Control": "no-cache"
    }
  }
}

The only tricky bit here is that I had to define a characterWhiteList of /)(; :. Mostly to handle all those special characters are in the User-Agent field.


References:

Just the documentation and some guess & checking in my personal Datadog account.

https://docs.datadoghq.com/logs/processing/parsing/?tab=matcher#key-value-or-logfmt

draav
  • 1,454
  • 8
  • 8
  • Thank you very much! But is it possible to extract exactly one field from that nested JSON? I can't see anything like that in the documentation. – user3285241 Jun 01 '20 at 07:52
  • 1
    The json parser will not be able to extract exactly one field from the nested json object. However, you could just use the normal grok parser tools to directly target that field. something like `rule %{data}"extract": *"%{regex("[^\"]*"):extract}"%{data}` I look for the pattern `"extract": *"` and then grab the text that appears after. idk how to make the wildcard selector greedy, so i just use the pattern `[^\"]*` to get the text inside the string. If you are trying to collect multiple values this way, you'd have to find a way to account for the order – draav Jun 01 '20 at 17:32