We are running Spark jobs on Apache Hadoop YARN. I have a special need to use the "LD_PRELOAD trick" on these jobs. (Before anyone panics, it's not for production runs; this is part of automated job testing).
I know how to submit additional files with the job, and I know how to set environment variables on the nodes, so adding these settings to spark-defaults.conf
almost provides a solution:
spark.files=/home/todd/pwn_connect.so
spark.yarn.appMasterEnv.LD_PRELOAD=pwn_connect.so
spark.executorEnv.LD_PRELOAD=pwn_connect.so
But I get this error in the container logs:
ERROR: ld.so: object 'pwn_connect.so' from LD_PRELOAD cannot be preloaded: ignored.
The problem seems to be that LD_PRELOAD doesn't accept the relative path that I'm providing. But I don't know how to provide an absolute path -- I don't have a clue where on the local filesystem of the nodes these files are being placed.