0

For spark jobs, we are trying to add a logging framework that creates a custom log file on a local FS. In client mode, everything is fine, the files are created on the local FS with the user who launched the spark-submit. However in cluster mode, the local files are created with the user yarn who does not have the permission to write to the local directory...

Is there any solution to write a local file in cluster mode with the user who submited the job without changing the permission to 777 everywhere? Is the cluster mode better in this case (we are on PROD environment), knowing that the job is launched from a node of the cluster (so there is no network issue).

Thank you.

1 Answers1

0

Yes, here is a way: Using shell script to submit spark jobs

We use logger to print all our logs. we always have unique text with the log message eg: log.info("INFO_CUSTOM: Info message"). Once our application is completed we will Yarn logs command and grep for the unique text.

  1. Get the application id using yarn command with application name.

eg. yarn application -list -appStates FINISHED,FAIED,KILLED | grep <application name>

  1. Run yarn logs command and grep, redirect it to the file you want.

eg. yarn logs -applicationId <application id u got fro step 1> | grep -w "INFO_CUSTOM" >> joblog.log

Sathiyan S
  • 1,013
  • 6
  • 13
  • Hello, the problem is not how to retreive yarn logs but how to write functional logs to a local FS in cluster mode. – Amir haroun Jul 02 '20 at 15:16
  • @Amirharoun all your functional logs will be written to the actual worker node which runs the code. All those logs will again be aggregated by Yarn and stored in HDFS dir. If you run yarn logs command it will show all the logs. In those logs you can extract what you are interested in. – Sathiyan S Jul 02 '20 at 15:57
  • the logs are not the problem, the problem is how to write a file to a local file system with the user who submited the job. because when you lauch it in cluster mode, it's the user yarn that tries to write the file and we do not want to give the 777 access to all the directories ... – Amir haroun Jul 02 '20 at 16:00
  • your question says log file so I interpreted the other way! – Sathiyan S Jul 02 '20 at 16:06
  • You are writing from driver node? if yes in cluster mode, any node will run this and that is going to be you local FS. Curious to know What is the use of writing files to any unknown node? – Sathiyan S Jul 02 '20 at 16:10