2

I am trying to ship my K8s pod logs to Elasticsearch using Filebeat.

I am following the guide online here: https://www.elastic.co/guide/en/beats/filebeat/6.0/running-on-kubernetes.html

Everything works as expected however I want to filter out events from system pods. My updated config looks like:

apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-prospectors
  namespace: kube-system
  labels:
    k8s-app: filebeat
    kubernetes.io/cluster-service: "true"
data:
  kubernetes.yml: |-
    - type: log
      paths:
        - /var/lib/docker/containers/*/*.log
  multiline.pattern: '^\s'
  multiline.match: after
  json.message_key: log
  json.keys_under_root: true
  processors:
    - add_kubernetes_metadata:
        in_cluster: true
        namespace: ${POD_NAMESPACE}
    - drop_event.when.regexp:
        or:
          kubernetes.pod.name: "weave-net.*"
          kubernetes.pod.name: "external-dns.*"
          kubernetes.pod.name: "nginx-ingress-controller.*"
          kubernetes.pod.name: "filebeat.*"

I am trying to ignore weave-net, external-dns, ingress-controller and filebeat events via:

- drop_event.when.regexp:
    or:
      kubernetes.pod.name: "weave-net.*"
      kubernetes.pod.name: "external-dns.*"
      kubernetes.pod.name: "nginx-ingress-controller.*"
      kubernetes.pod.name: "filebeat.*"

However they continue to arrive in Elasticsearch.

slm
  • 15,396
  • 12
  • 109
  • 124
timothyclifford
  • 6,799
  • 7
  • 57
  • 85
  • I used a different approach which is less efficient but works. My filebeat instances forward all data to a logstash instance and filtering of "good" and "bad" logs is made there using pod labels. If you are interested I can post the solution as an answer. – whites11 Dec 08 '17 at 07:31
  • Yes would be very interested! I'm not certain my current approach is the best... – timothyclifford Dec 08 '17 at 09:04

3 Answers3

6

The conditions need to be a list:

- drop_event.when.regexp:
    or:
      - kubernetes.pod.name: "weave-net.*"
      - kubernetes.pod.name: "external-dns.*"
      - kubernetes.pod.name: "nginx-ingress-controller.*"
      - kubernetes.pod.name: "filebeat.*"

I'm not sure if your order of parameters works. One of my working examples looks like this:

- drop_event:
    when:
      or:
        # Exclude traces from Zipkin
        - contains.path: "/api/v"
        # Exclude Jolokia calls
        - contains.path: "/jolokia/?"
        # Exclude pinging metrics
        - equals.path: "/metrics"
        # Exclude pinging health
        - equals.path: "/health"
xeraa
  • 10,456
  • 3
  • 33
  • 66
5

This worked for me in filebeat 6.1.3

        - drop_event.when:
            or:
            - equals:
                kubernetes.container.name: "filebeat"
            - equals:
                kubernetes.container.name: "prometheus-kube-state-metrics"
            - equals:
                kubernetes.container.name: "weave-npc"
            - equals:
                kubernetes.container.name: "nginx-ingress-controller"
            - equals:
                kubernetes.container.name: "weave"
Johny Jho
  • 95
  • 2
  • 9
  • Note that the container name is the most consistent and fitting filter variable if you want to drop events for all instances of e.g. filebeat (compared to pod name, path or line-based regex in other answers). – thorsten Oct 05 '21 at 07:49
4

I am using a different approach, that is less efficient in terms on the number of logs that transit in the logging pipeline.

Similarly on how you did, I deployed one instance of filebeat on my nodes, using a daemonset. Nothing special here, this is the configuration I am using:

apiVersion: v1
data:
  filebeat.yml: |-
    filebeat.config:
      prospectors:
        # Mounted `filebeat-prospectors` configmap:
        path: ${path.config}/prospectors.d/*.yml
        # Reload prospectors configs as they change:
        reload.enabled: false
      modules:
        path: ${path.config}/modules.d/*.yml
        # Reload module configs as they change:
        reload.enabled: false

    processors:
      - add_cloud_metadata:

    output.logstash:
      hosts: ['logstash.elk.svc.cluster.local:5044']
kind: ConfigMap
metadata:
  labels:
    k8s-app: filebeat
    kubernetes.io/cluster-service: "true"
  name: filebeat-config

And this one for the prospectors:

apiVersion: v1
data:
  kubernetes.yml: |-
    - type: log
      paths:
        - /var/lib/docker/containers/*/*.log
      json.message_key: log
      json.keys_under_root: true
      processors:
        - add_kubernetes_metadata:
            in_cluster: true
            namespace: ${POD_NAMESPACE}
kind: ConfigMap
metadata:
  labels:
    k8s-app: filebeat
    kubernetes.io/cluster-service: "true"
  name: filebeat-prospectors

The Daemonset spec:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    k8s-app: filebeat
    kubernetes.io/cluster-service: "true"
  name: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
      kubernetes.io/cluster-service: "true"
  template:
    metadata:
      labels:
        k8s-app: filebeat
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - args:
        - -c
        - /etc/filebeat.yml
        - -e
        command:
        - /usr/share/filebeat/filebeat
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: docker.elastic.co/beats/filebeat:6.0.1
        imagePullPolicy: IfNotPresent
        name: filebeat
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        securityContext:
          runAsUser: 0
        volumeMounts:
        - mountPath: /etc/filebeat.yml
          name: config
          readOnly: true
          subPath: filebeat.yml
        - mountPath: /usr/share/filebeat/prospectors.d
          name: prospectors
          readOnly: true
        - mountPath: /usr/share/filebeat/data
          name: data
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
          readOnly: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          name: filebeat-config
        name: config
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: varlibdockercontainers
      - configMap:
          defaultMode: 384
          name: filebeat-prospectors
        name: prospectors
      - emptyDir: {}
        name: data

Basically, all data from all logs from all containers gets forwarded to logstash, reachable at the service endpoint: logstash.elk.svc.cluster.local:5044 (service called "logstash" in the "elk" namespace).

For brevity, I'm gonna give you only the configuration for logstash (if you need more specific help with kubernetes, please ask in the comments):

The logstash.yml file is very basic:

http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline

Just indicating the mountpoint of the directory where I mounted the pipeline config files, which are the following:

10-beats.conf: declares an input for filebeat (port 5044 has to be exposed with a service called "logstash")

input {
  beats {
    port => 5044
    ssl => false
  }
}

49-filter-logs.conf: this filter basically drops logs coming from pods that don't have the "elk" label. For the pods that do have the "elk" label, it keeps the logs from containers named in the "elk" label of the pod. For instance, if a Pod has two containers, called "nginx" and "python", putting a label "elk" with value "nginx" will only keep the logs coming from the nginx container and drop the python ones. The type of the log is set as the namespace the pod is running in. This might not be a good fit for everybody (you're going to have a single index in elasticsearch for all logs belonging to a namespace) but it works for me because my logs are homologous.

filter {
    if ![kubernetes][labels][elk] {
        drop {}
    }
    if [kubernetes][labels][elk] {
        # check if kubernetes.labels.elk contains this container name
        mutate {
          split => { "kubernetes[labels][elk]" => "." }
        }
        if [kubernetes][container][name] not in [kubernetes][labels][elk] {
          drop {}
        }
        mutate {
          replace => { "@metadata[type]" => "%{kubernetes[namespace]}" }
          remove_field => [ "beat", "host", "kubernetes[labels][elk]", "kubernetes[labels][pod-template-hash]", "kubernetes[namespace]", "kubernetes[pod][name]", "offset", "prospector[type]", "source", "stream", "time" ]
          rename => { "kubernetes[container][name]" => "container"  }
          rename => { "kubernetes[labels][app]" => "app"  }
        }
    }
}

The rest of the configuration is about log parsing and is not relevant in this context. The only other important part is the output:

99-output.conf: Send data to elasticsearch:

output {
  elasticsearch {
    hosts => ["http://elasticsearch.elk.svc.cluster.local:9200"]
    manage_template => false
    index => "%{[@metadata][type]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

Hope you got the point here.

PROs of this approach

  • Once deployed filebeat and logstash, as long as you don't need to parse a new type of log, you don't need to update filebeat nor logstash configuration in order to get a new log in kibana. You just need to add a label in the pod template.
  • All log files get dropped by default, as long as you don't explicitly put the labels.

CONs of this approach

  • ALL logs from ALL pods come through filebeat and logstash, and get dropped only in logstash. This is a lot of work for logstash and can be resource consuming depending on the number of pods you have in your cluster.

I am sure there are better approaches to this problem, but I think this solution is quite handy, at least for my use case.

whites11
  • 12,008
  • 3
  • 36
  • 53