2

I have the following infrastructure:

ELK installed as docker containers, each in its own container. And on a virtual machine running CentOS I installed nginx web server and Filebeat to collect the logs. I enabled the nginx module in filebeat.

> filebeat modules enable nginx

Before starting filebeat I set it up with elasticsearch and installed it's dashboards on kibana.

config file (I have removed unnecessary comments from the file):

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.kibana:
  host: "172.17.0.1:5601"

output.elasticsearch:
  hosts: ["172.17.0.1:9200"]

then to set it up in elasticsearch and kibana

> filebeat setup -e --dashboards

This works fine. In fact if I keep it this way everything works perfectly. I can use the collected logs in kibana and use the dashboards for NGinX I installed with the above command.

I want though to pass the logs through to Logstash. And here's my Logstash configuration uses the following pipelines:

- pipeline.id: filebeat
  path.config: "config/filebeat.conf"

filebeat.conf:

input {
  beats {
    port => 5044
  }
}


#filter {
#  mutate {
#    add_tag => ["filebeat"]
#  }
#}


output {
  elasticsearch {
    hosts => ["elasticsearch0:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }

  stdout { }
}

Making the logs go through Logstash the resulting log is just:

{
        "offset" => 6655,
      "@version" => "1",
    "@timestamp" => 2019-02-20T13:34:06.886Z,
       "message" => "10.0.2.2 - - [20/Feb/2019:08:33:58 -0500] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36\" \"-\"",
          "beat" => {
         "version" => "6.5.4",
            "name" => "localhost.localdomain",
        "hostname" => "localhost.localdomain"
    },
        "source" => "/var/log/nginx/access.log",
          "host" => {
                   "os" => {
             "version" => "7 (Core)",
            "codename" => "Core",
              "family" => "redhat",
            "platform" => "centos"
        },
                 "name" => "localhost.localdomain",
                   "id" => "18e7cb2506624fb6ae2dc3891d5d7172",
        "containerized" => true,
         "architecture" => "x86_64"
    },
       "fileset" => {
          "name" => "access",
        "module" => "nginx"
    },
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
         "input" => {
        "type" => "log"
    },
    "prospector" => {
        "type" => "log"
    }
}

A lot of fields are missing from my object. There should have been many more structured information

UPDATE: This is what I'm expecting instead

{
  "_index": "filebeat-6.5.4-2019.02.20",
  "_type": "doc",
  "_id": "ssJPC2kBLsya0HU-3uwW",
  "_version": 1,
  "_score": null,
  "_source": {
    "offset": 9639,
    "nginx": {
      "access": {
        "referrer": "-",
        "response_code": "404",
        "remote_ip": "10.0.2.2",
        "method": "GET",
        "user_name": "-",
        "http_version": "1.1",
        "body_sent": {
          "bytes": "3650"
        },
        "remote_ip_list": [
          "10.0.2.2"
        ],
        "url": "/access",
        "user_agent": {
          "patch": "3578",
          "original": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.98 Chrome/71.0.3578.98 Safari/537.36",
          "major": "71",
          "minor": "0",
          "os": "Ubuntu",
          "name": "Chromium",
          "os_name": "Ubuntu",
          "device": "Other"
        }
      }
    },
    "prospector": {
      "type": "log"
    },
    "read_timestamp": "2019-02-20T14:29:36.393Z",
    "source": "/var/log/nginx/access.log",
    "fileset": {
      "module": "nginx",
      "name": "access"
    },
    "input": {
      "type": "log"
    },
    "@timestamp": "2019-02-20T14:29:32.000Z",
    "host": {
      "os": {
        "codename": "Core",
        "family": "redhat",
        "version": "7 (Core)",
        "platform": "centos"
      },
      "containerized": true,
      "name": "localhost.localdomain",
      "id": "18e7cb2506624fb6ae2dc3891d5d7172",
      "architecture": "x86_64"
    },
    "beat": {
      "hostname": "localhost.localdomain",
      "name": "localhost.localdomain",
      "version": "6.5.4"
    }
  },
  "fields": {
    "@timestamp": [
      "2019-02-20T14:29:32.000Z"
    ]
  },
  "sort": [
    1550672972000
  ]
}
500 Server error
  • 644
  • 13
  • 28
  • 1
    It doesn't look like you are parsing the log message. There's an example in the logstash documentation on this: https://www.elastic.co/guide/en/logstash/6.6/logstash-config-for-filebeat-modules.html#parsing-nginx – baudsp Feb 20 '19 at 16:25
  • thanks man. that is helpful. although I thought that since filebeat, if sending directly to elasticsearch sends the full object, passing through logstash should do the same. And I had this working in my local pc. Now I can't get it to work. I wasn't doing any filtering in logstash and I had it working. that's why it's so strange to me. @baudsp – 500 Server error Feb 20 '19 at 16:28
  • I don't know. I have very little experience with filebeat, I didn't even knew it could do some parsing of its own. – baudsp Feb 21 '19 at 08:40
  • 1
    @baudsp if you have time try using filebeat directly with elasticsearch and kibana. install the dashboards and indexes like this: filebeat setup --dashboards. enable some module, even the system module: filebeat modules enable system and then run it. Try opening one of the system dashboards on kibana. it's pretty nice. – 500 Server error Feb 21 '19 at 09:24
  • @baudsp your first comment almost worked. It is parsing the data, not enough so to work with the predefined kibana dashboards but I will build them from there. Post it as an answer and I'll accept that if you want! – 500 Server error Feb 21 '19 at 09:26
  • Looks like this same discussion was also held here: https://discuss.elastic.co/t/filebeat-input-in-logstash-is-losing-fields/169460 – Matthew Clark Oct 17 '19 at 19:55

2 Answers2

5

The answer provided by @baudsp was mostly correct, but it was incomplete. I had exactly the same problem, and I also had exactly the same filter mentioned in the documentation (and in @baudsp's answer), but documents in Elastic Search still did not contain any of the expected fields.

I finally found the problem: because I had Filebeat configured to send Nginx logs via the Nginx module and not the Log input, the data coming from Logbeat didn't match quite what the example Logstash filter was expecting.

The conditional in the example is if [fileset][module] == "nginx", which is correct if Filebeat was sending data from a Log input. However, since the log data is coming from the Nginx module, the fileset property doesn't contain a module property.

To make the filter work with Logstash data coming from the Nginx module, the conditional needs to be modified to look for something else. I found the [event][module] to work in place of [fileset][module].

The working filter:

filter {
  if [event][module] == "nginx" {
    if [fileset][name] == "access" {
      grok {
        match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
        remove_field => "message"
      }
      mutate {
        add_field => { "read_timestamp" => "%{@timestamp}" }
      }
      date {
        match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        remove_field => "[nginx][access][time]"
      }
      useragent {
        source => "[nginx][access][agent]"
        target => "[nginx][access][user_agent]"
        remove_field => "[nginx][access][agent]"
      }
      geoip {
        source => "[nginx][access][remote_ip]"
        target => "[nginx][access][geoip]"
      }
    }
    else if [fileset][name] == "error" {
      grok {
        match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
        remove_field => "message"
      }
      mutate {
        rename => { "@timestamp" => "read_timestamp" }
      }
      date {
        match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
        remove_field => "[nginx][error][time]"
      }
    }
  }
}

Now, documents in Elastic Search have all of the expected fields: Screenshot of Nginx access log entry in Elastic Search

Note: You'll have the same problem with other Filebeat modules, too. Just use [event][module] in place of [fileset][module].

Matthew Clark
  • 1,885
  • 21
  • 32
1

From your logstash configuration, it doesn't look like you are parsing the log message.

There's an example in the logstash documentation on how to parse nginx logs:

Nginx Logs

The Logstash pipeline configuration in this example shows how to ship and parse access and error logs collected by the nginx Filebeat module.

  input {
    beats {
      port => 5044
      host => "0.0.0.0"
    }
  }
  filter {
    if [fileset][module] == "nginx" {
      if [fileset][name] == "access" {
        grok {
          match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"%{DATA:[nginx][access][agent]}\""] }
          remove_field => "message"
        }
        mutate {
          add_field => { "read_timestamp" => "%{@timestamp}" }
        }
        date {
          match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
          remove_field => "[nginx][access][time]"
        }
        useragent {
          source => "[nginx][access][agent]"
          target => "[nginx][access][user_agent]"
          remove_field => "[nginx][access][agent]"
        }
        geoip {
          source => "[nginx][access][remote_ip]"
          target => "[nginx][access][geoip]"
        }
      }
      else if [fileset][name] == "error" {
        grok {
          match => { "message" => ["%{DATA:[nginx][error][time]} \[%{DATA:[nginx][error][level]}\] %{NUMBER:[nginx][error][pid]}#%{NUMBER:[nginx][error][tid]}: (\*%{NUMBER:[nginx][error][connection_id]} )?%{GREEDYDATA:[nginx][error][message]}"] }
          remove_field => "message"
        }
        mutate {
          rename => { "@timestamp" => "read_timestamp" }
        }
        date {
          match => [ "[nginx][error][time]", "YYYY/MM/dd H:m:s" ]
          remove_field => "[nginx][error][time]"
        }
      }
    }
  }

I know it doesn't deal with why filebeat doesn't send to logstash the full object, but it should give a start on how to parse the nginx logs in logstash.

baudsp
  • 4,076
  • 1
  • 17
  • 35
  • Is there anything similar to configure auditbeat in logstash? I can't find any examples. – 500 Server error Feb 21 '19 at 14:24
  • @500Servererror I haven't found anything in the documentation. You'd have to write it yourself. – baudsp Feb 21 '19 at 16:22
  • strangely enough auditbeat is sending all the fields – 500 Server error Feb 21 '19 at 17:01
  • I'm having exactly the same problem as described in the original question: Filebeat sending Nginx logs through Logstash instead of directly to Elastic Search results in pretty much all of the interesting fields being missing (everything I want to see is still only in the message field). I have the Logstash configuration exactly as above (from the same source), so unless I'm missing something else, this does not appear to be the solution. – Matthew Clark Oct 17 '19 at 19:39
  • probably depends on the ELK stack version. this was already answered almost 1 year ago. – 500 Server error Oct 19 '19 at 11:31