I have spark submit scirpt as follows:
spark-submit \
--name daily_job\
--class com.test.Bootstrapper \
--files /home/user/*.csv\
--conf spark.executor.memory=2g\
--conf spark.executor.cores=2\
--master spark://172.17.0.4:7077\
--deploy-mode client \
--packages com.typesafe:config:1.3.1\
file:///home/user/workspace/spark-test/target/spark-test-0.1-SNAPSHOT.jar
Cluster configuration - master & 2 workers in different containers.
After job started I can see that csv files are being put into:
Worker:
/usr/local/spark-2.0.2-bin-hadoop2.7/work/app-20170116160937-0036/0/test.csv
Driver:
/tmp/spark-f65b2466-e419-49bd-8da7-9f2b94cbf870/userFiles-abb14b33-58b1-47d6-935e-6c2943e3d55c/test.csv
The question is - how to properly read this file? Currently I am doing as follows:
private var initial: DataFrame = spark.sqlContext.read
.option("mode", "DROPMALFORMED")
.option("delimiter", conf.delimiter)
.option("dateFormat", conf.dateFormat)
.schema(conf.schema)
.csv("file:///*.csv")
Which results in FileNotFoundException.