2

I am trying to run fluent-bit inside spark container so that spark driver container which is writing the logs in a file /var/log/sparkDriver.log controlled by spark log4j properties, can be read or consumed by fluent-bit. I know that running multiple processes in one container is an AntiParttern but right now I have no choice. What configuration I need, to read this file (/var/log/sparkDriver.log) and forward the logs to our internal splunk hec server.

I know fluent-bit can be used as a sidecar in the pod but I am using simple spark-submit to submit my spark job to K8S and spark-submit doesn't have any way to tell k8s that I want to run a sidecar (fluent-bit) as well.

I also know that fluent-bit can installed as deamonSet in the cluster which will basically run on each node in the k8s cluster and forward logs from the container via node to Splunk. But this option is also not going to work for me.

So I thought if I could bake fluent-bit or splunkforwarder or even fluentd and read the logs from a file or stdout. I know that the other 2 options will inflate my spark docker image but I don't have an option right now.

Any help or suggestion will be really appreciated

I actually tried the tail and splunk but somehow I am not able to figure out the right configuration for fluent-bit

Here is my log file which is spark logs using log4j:

I actually tried it but somehow I am not able to put the right configuration around it. Here is how my log files look:

20/03/02 19:35:47 INFO TaskSetManager: Starting task 12526.0 in stage 0.0 (TID 12526, 172.16.7.233, executor 1, partition 12526, PROCESS_LOCAL, 7885 bytes)
20/03/02 19:35:47 DEBUG KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Launching task 12526 on executor id: 1 hostname: 172.16.7.233.
20/03/02 19:35:47 INFO TaskSetManager: Finished task 12524.0 in stage 0.0 (TID 12524) in 1 ms on 172.16.7.233 (executor 1) (12525/1000000)
20/03/02 19:35:47 TRACE MessageDecoder: Received message OneWayMessage: OneWayMessage{body=NettyManagedBuffer{buf=CompositeByteBuf(ridx: 5, widx: 1622, cap: 1622, components=2)}}
20/03/02 19:35:47 TRACE MessageDecoder: Received message OneWayMessage: OneWayMessage{body=NettyManagedBuffer{buf=PooledUnsafeDirectByteBuf(ridx: 13, widx: 1630, cap: 32768)}}
20/03/02 19:35:47 TRACE MessageDecoder: Received message OneWayMessage: OneWayMessage{body=NettyManagedBuffer{buf=PooledUnsafeDirectByteBuf(ridx: 13, widx: 2414, cap: 4096)}}

Here is my fluent-bit configuration:

[INPUT]
    Name  tail
    Path  /var/log/sparklog.log

# nest the record under the 'event' key
[FILTER]
    Name nest
    Match *
    Operation nest
    Wildcard *
    Nest_under event

# add event metadata
[FILTER]
    Name      modify
    Match     *
    Add index myindex
    Add host  ${HOSTNAME}
    Add app_name ${APP_NAME}
    Add namespace ${NAMESPACE}

[OUTPUT]
    Name        Splunk
    Match       *
    Host        splunk.example.com
    Port        30000
    Splunk_Token XXXX-XXXX-XXXX-XXXX
    Splunk_Send_Raw On
    TLS         On
    TLS.Verify  Off
devnull
  • 161
  • 15

1 Answers1

0

Tail https://docs.fluentbit.io/manual/input/tail and splunk plugin https://docs.fluentbit.io/manual/output/splunk should do the trick for you.

Are you facing any specific issue with configuring these two?

Tummala Dhanvi
  • 3,007
  • 2
  • 19
  • 35