reparsing a logstash record? fix extracts?

Question

I'm taking a JSON message (Cloudtrail, many objects concatenated together) and by the time I'm done filtering it, Logstash doesn't seem to be parsing the message correctly. It's as if the hash was simply dumped into a string.

Anyhow, here's the input and filter.

input {
  s3 {
    bucket => "stanson-ops"
    delete => false
    #snipped unimportant bits
    type => "cloudtrail"
  }
}

filter {
  if [type] == "cloudtrail" {
    json { # http://logstash.net/docs/1.4.2/filters/json
      source => "message"
    }
    ruby {
      code => "event['RecordStr'] = event['Records'].join('~~~')"
    }
    split {
      field => "RecordStr"
      terminator => "~~~"
      remove_field => [ "message", "Records" ]
    }
  }
}

By the time I'm done, elasticsearch entries include a RecordStr key with the following data. It doesn't have a message field, nor does it have a Records field.

{"eventVersion"=>"1.01", "userIdentity"=>{"type"=>"IAMUser", "principalId"=>"xxx"}}

Note that is not JSON style, it's been parsed. (which is important for the concat->split thing to work).

So, the RecordStr key looks not quite right as one value. Further, in Kibana, filterable fields include RecordStr (no subfields). It includes some entries that aren't there anymore: Records.eventVersion, Records.userIdentity.type.

Why is that? How can I get the proper fields?

edit 1 here's part of the input.

{"Records":[{"eventVersion":"1.01","userIdentity":{"type":"IAMUser",

It's unprettified JSON. It appears the body of the file (the above) is in the message field, json extracts it and I end up with an array of records in the Records field. That's why I join and split it- I then end up with individual documents, each with a single RecordStr entry. However, the template(?) doesn't seem to understand the new structure.

@MagnusBäck [see this documentation](http://docs.aws.amazon.com/awscloudtrail/latest/userguide/event_reference_top_level.html), which gives a readable example. — tedder42, Oct 31 '14 at 06:46
Does that example exactly represent what the log looks like, or has the JSON been pretty-printed? I'm asking since there's no way that your configuration would be able to parse that example. Perhaps the [cloudtrail codec](http://logstash.net/docs/1.4.2/codecs/cloudtrail) would be useful? — Magnus Bäck, Oct 31 '14 at 07:05
@MagnusBäck I added the first hundred characters or so. Let me know if more is needed, but it appears to line up- I've placed an explanation too. The cloudtrail codec [doesn't exist on github](https://github.com/elasticsearch/logstash/tree/1.4/lib/logstash/codecs)- I don't know why there is a document for it. — tedder42, Oct 31 '14 at 15:59
As the documentation says, it's part of the contrib plugin package which is found in a git of its own. See the [installation instructions](http://logstash.net/docs/1.4.2/contrib-plugins). — Magnus Bäck, Nov 02 '14 at 20:55

score 0 · Answer 1 · answered Oct 09 '18 at 00:39

I've worked out a method that allows for indexing the appropriate CloudTrail fields as you requested. Here are the modified input and filter configs:

input {
  s3 {
    backup_add_prefix => \"processed-logs/\"
    backup_to_bucket => \"test-bucket\"
    bucket => \"test-bucket\"
    delete => true
    interval => 30
    prefix => \"AWSLogs/<account-id>/CloudTrail/\"
    type => \"cloudtrail\"
  }
}

filter {
  if [type] == \"cloudtrail\" {
    json {
      source => \"message\"
    }
    ruby {
      code => \"event.set('RecordStr', event.get('Records').join('~~~'))\"
    }
    split {
      field => \"RecordStr\"
      terminator => \"~~~\"
      remove_field => [ \"message\", \"Records\" ]
    }
    mutate {
      gsub => [
        \"RecordStr\", \"=>\", \":\"
      ]
    }
    mutate {
      gsub => [
        \"RecordStr\", \"nil\", \"null\"
      ]
    }
    json {
      skip_on_invalid_json => true
      source => \"RecordStr\"
      target => \"cloudtrail\"
    }
    mutate {
      add_tag => [\"cloudtrail\"]
      remove_field=>[\"RecordStr\", \"@version\"]
    }
    date {
      match => [\"[cloudtrail][eventTime]\",\"ISO8601\"]
    }
  }
}

The key observation here is that once the split is done we no longer possess valid json in the event and are therefore required to execute the mutate replacements ('=>' to ':' and 'nil' to 'null'). Additionally, I found it useful to get the timestamp out of the CloudTrail eventTime and do some cleanup of unnecessary fields.

reparsing a logstash record? fix extracts?

1 Answers1