0

I have a trivially small Spark application written in Java that I am trying to run in a K8s cluster using spark-submit. I built an image with Spark binaries, my uber-JAR file with all necessary dependencies (in /opt/spark/jars/my.jar), and a config file (in /opt/spark/conf/some.json).

In my code, I start with

SparkSession session = SparkSession.builder()
.appName("myapp")
.config("spark.logConf", "true")
.getOrCreate();

Path someFilePath = FileSystems.getDefault().getPath("/opt/spark/conf/some.json");
String someString = new String(Files.readAllBytes(someFilePath));

and get this exception at readAllBytes from the Spark driver:

java.nio.file.NoSuchFileException: /opt/spark/conf/some.json

If I run my Docker image manually I can definitely see the file /opt/spark/conf/some.json as I expect. My Spark job runs as root so file permissions should not be a problem.

I have been assuming that, since the same Docker image, with the file indeed present, will be used to start the driver (and executors, but I don't even get to that point), the file should be available to my application. Is that not so? Why wouldn't it see the file?

Rico
  • 58,485
  • 12
  • 111
  • 141
mustaccio
  • 18,234
  • 16
  • 48
  • 57

1 Answers1

1

You seem to get this exception from one of your worker nodes, not from the container.

Make sure that you've specified all files needed as --files option for spark-submit.

spark-submit --master yarn --deploy-mode cluster --files <local files dependecies> ...

https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

andreoss
  • 1,570
  • 1
  • 10
  • 25
  • Not sure what you mean by "from one of your worker nodes, not from the container" -- the exception appears in the Spark driver log. And, given that the file in question is a part of the Docker image, I am under the impression that `--files` is unnecessary... – mustaccio Jul 13 '20 at 00:16
  • @mustaccio How is your application deployed? Driver is not necessarily run inside your docker container, it could be – andreoss Jul 13 '20 at 00:30