I am trying to run fluent-bit
inside spark
container so that spark driver container
which is writing the logs in a file /var/log/sparkDriver.log
controlled by spark log4j
properties, can be read or consumed by fluent-bit
. I know that running multiple processes in one container is an AntiParttern but right now I have no choice. What configuration I need, to read this file (/var/log/sparkDriver.log
) and forward the logs to our internal splunk hec
server.
I know fluent-bit
can be used as a sidecar
in the pod but I am using simple spark-submit
to submit my spark job to K8S
and spark-submit
doesn't have any way to tell k8s
that I want to run a sidecar (fluent-bit) as well.
I also know that fluent-bit
can installed as deamonSet in the cluster which will basically run on each node in the k8s
cluster and forward logs from the container via node to Splunk
. But this option is also not going to work for me.
So I thought if I could bake fluent-bit
or splunkforwarder
or even fluentd
and read the logs from a file or stdout. I know that the other 2 options will inflate my spark
docker image but I don't have an option right now.
Any help or suggestion will be really appreciated
I actually tried the tail
and splunk
but somehow I am not able to figure out the right configuration for fluent-bit
Here is my log file which is spark
logs using log4j
:
I actually tried it but somehow I am not able to put the right configuration around it. Here is how my log files look:
20/03/02 19:35:47 INFO TaskSetManager: Starting task 12526.0 in stage 0.0 (TID 12526, 172.16.7.233, executor 1, partition 12526, PROCESS_LOCAL, 7885 bytes)
20/03/02 19:35:47 DEBUG KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Launching task 12526 on executor id: 1 hostname: 172.16.7.233.
20/03/02 19:35:47 INFO TaskSetManager: Finished task 12524.0 in stage 0.0 (TID 12524) in 1 ms on 172.16.7.233 (executor 1) (12525/1000000)
20/03/02 19:35:47 TRACE MessageDecoder: Received message OneWayMessage: OneWayMessage{body=NettyManagedBuffer{buf=CompositeByteBuf(ridx: 5, widx: 1622, cap: 1622, components=2)}}
20/03/02 19:35:47 TRACE MessageDecoder: Received message OneWayMessage: OneWayMessage{body=NettyManagedBuffer{buf=PooledUnsafeDirectByteBuf(ridx: 13, widx: 1630, cap: 32768)}}
20/03/02 19:35:47 TRACE MessageDecoder: Received message OneWayMessage: OneWayMessage{body=NettyManagedBuffer{buf=PooledUnsafeDirectByteBuf(ridx: 13, widx: 2414, cap: 4096)}}
Here is my fluent-bit
configuration:
[INPUT]
Name tail
Path /var/log/sparklog.log
# nest the record under the 'event' key
[FILTER]
Name nest
Match *
Operation nest
Wildcard *
Nest_under event
# add event metadata
[FILTER]
Name modify
Match *
Add index myindex
Add host ${HOSTNAME}
Add app_name ${APP_NAME}
Add namespace ${NAMESPACE}
[OUTPUT]
Name Splunk
Match *
Host splunk.example.com
Port 30000
Splunk_Token XXXX-XXXX-XXXX-XXXX
Splunk_Send_Raw On
TLS On
TLS.Verify Off