Correlate messages in ELK by field

Question

Related to: Combine logs and query in ELK

We are setting up ELK and would want to create a visualization in Kibana 4. The issue here is that we want to relate between two different types of message.

To simplify:

Message type 1 fields: message_type, common_id_number, byte_count, ...
Message type 2 fields: message_type, common_id_number, hostname, ...

Both messages share the same index in elasticsearch.

As you can see we were trying to graph without taking that common_id_number into account, but it seems that we must use it. We don't know how yet, though.

Any help?

EDIT

These are the relevant field definitions in the ES template:

      "URIHost" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "Type" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "SessionID" : {
        "type" : "long"
      },
      "Bytes" : {
        "type" : "long"
      },
      "BytesReceived" : {
        "type" : "long"
      },
      "BytesSent" : {
        "type" : "long"
      },

This is a TRAFFIC type, edited document:

{
  "_index": "logstash-2015.11.05",
  "_type": "paloalto",
  "_id": "AVDZqdBjpQiRid-uxPjE",
  "_score": null,
  "_source": {
    "@version": "1",
    "@timestamp": "2015-11-05T21:59:55.543Z",
    "syslog_severity_code": 5,
    "syslog_facility_code": 1,
    "syslog_timestamp": "Nov  5 22:59:58",
    "Type": "TRAFFIC",
    "SessionID": 21713,
    "Bytes": 939,
    "BytesSent": 480,
    "BytesReceived": 459,
  },
  "fields": {
    "@timestamp": [
      1446760795543
    ]
  },
  "sort": [
    1446760795543
  ]
}

And this is a THREAT type document:

{
  "_index": "logstash-2015.11.05",
  "_type": "paloalto",
  "_id": "AVDZqVNIpQiRid-uxPjC",
  "_score": null,
  "_source": {
    "@version": "1",
    "@timestamp": "2015-11-05T21:59:23.440Z",
    "syslog_severity_code": 5,
    "syslog_facility_code": 1,
    "syslog_timestamp": "Nov  5 22:59:26",
    "Type": "THREAT",
    "SessionID": 21713,
    "URIHost": "whatever.nevermind.com",
    "URIPath": "/connectiontest.html"
  },
  "fields": {
    "@timestamp": [
      1446760763440
    ]
  },
  "sort": [
    1446760763440
  ]
}

This is the logstash "filter" configuration:

filter {
    if [type] == "paloalto" {
        syslog_pri {
            remove_field => [ "syslog_facility", "syslog_severity" ]
        }

        grok {
            match => {
                "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{HOSTNAME:hostname} %{INT},%{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME},%{INT},%{WORD:Type},%{GREEDYDATA:log}"
            }
            remove_field => [ "message" ]
        }

        if [Type] == "THREAT" {
            csv {
                source => "log"
                columns => [ "Threat_OR_ContentType", "ConfigVersion", "GenerateTime", "SourceAddress", "DestinationAddress", "NATSourceIP", "NATDestinationIP", "Rule", "SourceUser", "DestinationUser", "Application", "VirtualSystem", "SourceZone", "DestinationZone", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "SessionID", "RepeatCount", "SourcePort", "DestinationPort", "NATSourcePort", "NATDestinationPort", "Flags", "IPProtocol", "Action", "URL", "Threat_OR_ContentName", "reportid", "Category", "Severity", "Direction", "seqno", "actionflags", "SourceCountry", "DestinationCountry", "cpadding", "contenttype", "pcap_id", "filedigest", "cloud", "url_idx", "user_agent", "filetype", "xff", "referer", "sender", "subject", "recipient" ]
                remove_field => [ "log" ]
            }
            mutate {
                convert => {
                    "SessionID" => "integer"
                    "SourcePort" => "integer"
                    "DestinationPort" => "integer"
                    "NATSourcePort" => "integer"
                    "NATDestinationPort" => "integer"
                }
                remove_field => [ "ConfigVersion", "GenerateTime", "VirtualSystem", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "RepeatCount", "Flags", "Action", "reportid", "Severity", "seqno", "actionflags", "cpadding", "pcap_id", "filedigest", "recipient" ]
            }
            grok {
                match => {
                    "URL" => "%{URIHOST:URIHost}%{URIPATH:URIPath}(%{URIPARAM:URIParam})?"
                }
                remove_field => [ "URL" ]
            }
        }

        else if [Type] == "TRAFFIC" {
            csv {
                source => "log"
                columns => [ "Threat_OR_ContentType", "ConfigVersion", "GenerateTime", "SourceAddress", "DestinationAddress", "NATSourceIP", "NATDestinationIP", "Rule", "SourceUser", "DestinationUser", "Application", "VirtualSystem", "SourceZone", "DestinationZone", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "SessionID", "RepeatCount", "SourcePort", "DestinationPort", "NATSourcePort", "NATDestinationPort", "Flags", "IPProtocol", "Action", "Bytes", "BytesSent", "BytesReceived", "Packets", "StartTime", "ElapsedTimeInSecs", "Category", "Padding", "seqno", "actionflags", "SourceCountry", "DestinationCountry", "cpadding", "pkts_sent", "pkts_received", "session_end_reason" ]
                remove_field => [ "log" ]
            }
            mutate {
                convert => {
                    "SessionID" => "integer"
                    "SourcePort" => "integer"
                    "DestinationPort" => "integer"
                    "NATSourcePort" => "integer"
                    "NATDestinationPort" => "integer"
                    "Bytes" => "integer"
                    "BytesSent" => "integer"
                    "BytesReceived" => "integer"
                    "ElapsedTimeInSecs" => "integer"
                }
                remove_field => [ "ConfigVersion", "GenerateTime", "VirtualSystem", "InboundInterface", "OutboundInterface", "LogAction", "TimeLogged", "RepeatCount", "Flags", "Action", "Packets", "StartTime", "seqno", "actionflags", "cpadding", "pcap_id", "filedigest", "recipient" ]
            }
        }

        date {
            match => [ "syslog_timastamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
            timezone => "CET"
            remove_field => [ "syslog_timestamp" ]
        }
    }
}

What we are trying to do is to visualize URIHost terms as X axis and Bytes, BytesSent and BytesReceived sums as Y axis.

What can be the values of the `message_type` field and do "type 1" messages always come before "type 2" ones? — Val, Nov 08 '15 at 15:08
Also can you share your existing Logstash configuration so people are not guessing on your setup? — Val, Nov 08 '15 at 15:22
"As you can see" means "please stare at my screen shot and try to reverse engineer what I meant to do". Can you better describe the data that exists in elasticsearch (maybe a table with a real sample) and how you would like this data presented (show us how it should be combined, etc, just in a table), and then better describe your issues with visualizing it. — Alain Collins, Nov 08 '15 at 16:57
@Val: One or many "THREAT" type messages should come before a single "TRAFFIC" type message — AMS, Nov 09 '15 at 08:48
Thanks, so to sum up, you have 1+ THREAT logs and 1 ending TRAFFIC log, all of which share the same `SessionID`? Is that correct? In case you have two or more THREAT logs, do you also want to aggregate them together? — Val, Nov 09 '15 at 08:54
@Val: Correct, one or several THREAT messages share the same SessionID with a single TRAFFIC end-session message. Can we display those ES aggregations (https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations.html) in Kibana? Could you answer with an example? — AMS, Nov 09 '15 at 09:06
I'll come up with something, I just need to know what happens in the case there are several THREAT messages, will they contain different `URIHost` values or will it always be the same given the same `SessionID`? — Val, Nov 09 '15 at 09:08
SessionID refer to TCP sessions identifiers. If a TCP session to a host is reused for different HTTP requests then those THREAT messages related to each request will share the SessionID related to the underlying TCP connection. — AMS, Nov 09 '15 at 09:48

score 4 · Accepted Answer · answered Nov 09 '15 at 09:55

I think you can use the aggregate filter to carry out your task. The aggregate filter provides support for aggregating several log lines into one single event based on a common field value. In your case, the common field we're going to use will be the SessionID field.

Then we need another field to detect the first event vs the second/last event that should be aggregated. In your case, this would be the Type field.

You need to change your current configuration like this:

filter {

    ... all other filters

    if [Type] == "THREAT" {
        ... all other filters

        aggregate {
            task_id => "%{SessionID}"
            code => "map['URIHost'] = event['URIHost']; map['URIPath'] = event['URIPath']"
        }
    }

    else if [Type] == "TRAFFIC" {
        ... all other filters

        aggregate {
            task_id => "%{SessionID}"
            code => "event['URIHost'] = map['URIHost']; event['URIPath'] = map['URIPath']"
            end_of_task => true
            timeout => 120
        }
    }
}

The general idea is that when Logstash encounters THREAT logs it will temporarily store the URIHost and URIPath in the in-memory event map, and then when a TRAFFIC log comes in, the URIHost and URIPath fields will be added to the event. You can copy other fields, too, if needed. You can also adapt the timeout (in seconds) depending on how long you expect a TRAFFIC event to come in after the last THREAT event.

In the end, you'll get documents with data merged from both THREAT and TRAFFIC log lines and you can easily create the visualization showing bytes count per URIHost as shown on your screenshot.

I left out the URIPath from the analysis to make URIHost stand out better. Thank you! — AMS, Nov 09 '15 at 11:54

Correlate messages in ELK by field

1 Answers1

Linked