2

We are running Spark jobs on Apache Hadoop YARN. I have a special need to use the "LD_PRELOAD trick" on these jobs. (Before anyone panics, it's not for production runs; this is part of automated job testing).

I know how to submit additional files with the job, and I know how to set environment variables on the nodes, so adding these settings to spark-defaults.conf almost provides a solution:

spark.files=/home/todd/pwn_connect.so
spark.yarn.appMasterEnv.LD_PRELOAD=pwn_connect.so
spark.executorEnv.LD_PRELOAD=pwn_connect.so

But I get this error in the container logs:

ERROR: ld.so: object 'pwn_connect.so' from LD_PRELOAD cannot be preloaded: ignored.

The problem seems to be that LD_PRELOAD doesn't accept the relative path that I'm providing. But I don't know how to provide an absolute path -- I don't have a clue where on the local filesystem of the nodes these files are being placed.

Todd Owen
  • 15,650
  • 7
  • 54
  • 52

2 Answers2

1

Firstly, spark.files is not used when running on YARN, it should be spark.yarn.dist.files. And note that this will be overwritten if the --files argument is provided to spark-submit.

For LD_PRELOAD, there are two solutions that will work:

  1. Relative paths can be used; they need to be prefixed with ./:

    spark.yarn.dist.files=/home/todd/pwn_connect.so
    spark.yarn.appMasterEnv.LD_PRELOAD=./pwn_connect.so
    spark.executorEnv.LD_PRELOAD=./pwn_connect.so
    

    (relative paths without ./ are searched for in LD_LIBRARY_PATH, rather than the current working directory).

  2. If an absolute path is preferred, examining the Spark source code reveals that the whole command line including environment variable assignments are subject to expansion by the shell, so the expression $PWD will be expanded to the current working directory:

    spark.yarn.dist.files=/home/todd/pwn_connect.so
    spark.yarn.appMasterEnv.LD_PRELOAD=$PWD/pwn_connect.so
    spark.executorEnv.LD_PRELOAD=$PWD/pwn_connect.so
    
Todd Owen
  • 15,650
  • 7
  • 54
  • 52
0

I have had a similar problem for a year and half,tried several ways, did not worked; until I saw this comment. Thank you.

--conf spark.yarn.dist.files=/usr/lib64/libopenblas64.so \
--conf spark.yarn.appMasterEnv.LD_PRELOAD=./libopenblas64.so \
--conf spark.executorEnv.LD_PRELOAD=./libopenblas64.so \
Prabodh Mhalgi
  • 805
  • 9
  • 25
sam
  • 1
  • 1
  • Please don't add "thank you" as an answer. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation), you will be able to [vote up questions and answers](https://stackoverflow.com/help/privileges/vote-up) that you found helpful. - [From Review](/review/late-answers/34677583) – Koedlt Jul 14 '23 at 11:18