I'm using fluent-bit to collect logs and pass it to fluentd for processing in a Kubernetes environment. Fluent-bit instances are controlled by DaemonSet and read logs from docker containers.
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
There is a fluent-bit service also running
Name: monitoring-fluent-bit-dips
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=fluent-bit-dips
app.kubernetes.io/version=1.8.10
helm.sh/chart=fluent-bit-0.19.6
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=fluent-bit-dips
Type: ClusterIP
IP Families: <none>
IP: 10.43.72.32
IPs: <none>
Port: http 2020/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.144:2020,10.42.1.155:2020,10.42.2.186:2020 + 1 more...
Session Affinity: None
Events: <none>
Fluentd service description is as below
Name: monitoring-logservice
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=logservice
app.kubernetes.io/version=1.9
helm.sh/chart=logservice-0.1.2
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=logservice
Type: ClusterIP
IP Families: <none>
IP: 10.43.44.254
IPs: <none>
Port: http 24224/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.143:24224
Session Affinity: None
Events: <none>
But fluent-bit logs doesn't reach fluentd and getting following error
[error] [upstream] connection #81 to monitoring-fluent-bit-dips:24224 timed out after 10 seconds
I tried several things like;
- re-deploying fluent-bit pods
- re-deploy fluentd pod
- Upgrade fluent-bit version from 1.7.3 to 1.8.10
This is an Kubernetes environment where fluent-bit able to communicate with fluentd in the very earlier stage of deployment. Apart from that, this same fluent versions is working when I deploy locally with docker-desktop environment.
My guesses are
- fluent-bit cannot manage the amount of log process
- fluent services are unable to communicate once the services are restarted
Anyone having any experience in this or has any idea how to debug this issue more deeper?
Updated following with fluentd running pod description
Name: monitoring-logservice-5b8864ffd8-gfpzc
Namespace: dips
Priority: 0
Node: sl-sy-k3s-01/10.16.1.99
Start Time: Mon, 29 Nov 2021 13:09:13 +0530
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/name=logservice
pod-template-hash=5b8864ffd8
Annotations: kubectl.kubernetes.io/restartedAt: 2021-11-29T12:37:23+05:30
Status: Running
IP: 10.42.0.143
IPs:
IP: 10.42.0.143
Controlled By: ReplicaSet/monitoring-logservice-5b8864ffd8
Containers:
logservice:
Container ID: containerd://102483a7647fd2f10bead187eddf69aa4fad72051d6602dd171e1a373d4209d7
Image: our.private.repo/dips/logservice/splunk:1.9
Image ID: our.private.repo/dips/logservice/splunk@sha256:531f15f523a251b93dc8a25056f05c0c7bb428241531485a22b94896974e17e8
Ports: 24231/TCP, 24224/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 29 Nov 2021 13:09:14 +0530
Ready: True
Restart Count: 0
Liveness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOME_ENV_VARS
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from monitoring-logservice-token-g9kwt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
monitoring-logservice-token-g9kwt:
Type: Secret (a volume populated by a Secret)
SecretName: monitoring-logservice-token-g9kwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>