Send spark driver logs running in k8s to Splunk

Question

I am trying to run a sample spark job in kubernetes by following the steps mentioned here: https://spark.apache.org/docs/latest/running-on-kubernetes.html.

I am trying to send the spark driver and executor logs to Splunk. Does spark provide any configuration to do the same? How do I send the Splunk configurations like the HEC endpoint, port, token, etc in the spark-submit command?

I did try passing it as args to the the spark driver as

bin/spark-submit
  --deploy-mode cluster
  --class org.apache.spark.examples.JavaSparkPi
  --master k8s://http://127.0.0.1:8001
  --conf spark.executor.instances=2
  --conf spark.app.name=spark-pi
  --conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.4
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=<account>
  --conf spark.kubernetes.docker.image.pullPolicy=Always
  --conf spark.kubernetes.namespace=default
  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
  --log-driver=splunk
  --log-opt splunk-url=<url:port>
  -—log-opt splunk-token=<token>
  --log-opt splunk-index=<index>
  --log-opt splunk-sourcetype=<sourceType>
  --log-opt splunk-format=json

but the logs were not forwarded to the desired index.

I am using spark version 2.4.4 to run spark-submit.

Thanks in advance for any inputs!!

score 1 · Answer 1 · answered Jan 25 '20 at 21:15

Hi and welcome to the Stackoverflow.

I've searched the web for a while trying to find the similar to your question cases of Spark + Splunk usages. What I've managed to realize is that possibly you're mixing several things. Referring the Docker docs about Splunk logging driver seems that you try to reproduce the same steps with `spark-submit. Unfortunately for you it doesn't work so.

Basically all the config options after local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar ... in your script are the program arguments for the org.apache.spark.examples.JavaSparkPi#main method , which (unless you customize it) simply ignores them.

What you need to do is to connect your Kubrnetes cluster to the Splunk API. One of the ways of doing that is installing the Splunk Connector to you Kubernetes cluster. Depending on your environment specifics there can be other ways of doing that, but reading the docs is a good place to start.

Hope it directs you to the right road.

Send spark driver logs running in k8s to Splunk

1 Answers1