I want to store the Spark arguments such as input file, output file into a Java property files and pass that file into Spark Driver. I'm using spark-submit for submitting the job but couldn't find a parameter to pass the properties file. Have you got any suggestions?
-
1have you tired this option: --properties-file FILE Path to a file from which to load extra properties – vijay kumar Jun 29 '15 at 14:40
2 Answers
here i found one solution:
props file : (mypropsfile.conf) // note: prefix your key with "spark." else props will be ignored.
spark.myapp.input /input/path
spark.myapp.output /output/path
launch
$SPARK_HOME/bin/spark-submit --properties-file mypropsfile.conf
how to call in code :( inside code)
sc.getConf.get("spark.driver.host") // localhost
sc.getConf.get("spark.myapp.input") // /input/path
sc.getConf.get("spark.myapp.output") // /output/path

- 2,049
- 1
- 15
- 18
-
@ramisetty.vijay : the file extension should be .conf or we could use .properties also? – Shankar Sep 22 '15 at 17:21
-
it worked for .properties file also. but format inside should be spark.my.key \t myvalue – vijay kumar Sep 23 '15 at 09:36
-
1
-
4Warning, the use of --properties-file does overwrite any previous defined spark-defaults.conf (http://spark.apache.org/docs/latest/submitting-applications.html) so it may be necessary to create your own merged version. – ChristopherB Nov 18 '15 at 16:02
-
Using --properties-file can override any settings defined in spark-defaults.conf which can be different for each environment. Do you need to ensure that the properties file is a merged version? – Kans Nov 01 '18 at 01:29
The previous answer's approach has the restriction that is every property should start with spark
in property file-
e.g.
spark.myapp.input
spark.myapp.output
If suppose you have a property which doesn't start with spark
:
job.property:
app.name=xyz
$SPARK_HOME/bin/spark-submit --properties-file job.property
Spark will ignore all properties doesn't have prefix spark.
with message:
Warning: Ignoring non-spark config property: app.name=test
How I manage property file in application's driver and executor:
${SPARK_HOME}/bin/spark-submit --files job.properties
Java code to access the cache file (job.properties):
import java.util.Properties;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkFiles;
import java.io.InputStream;
import java.io.FileInputStream;
//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get("job.properties")
Configuration hdfsConf = new Configuration();
FileSystem fs = FileSystem.get(hdfsConf);
//THe file name contains absolute path of file
FSDataInputStream is = fs.open(new Path(fileName));
// Or use java IO
InputStream is = new FileInputStream("/res/example.xls");
Properties prop = new Properties();
//load properties
prop.load(is)
//retrieve properties
prop.getProperty("app.name");
If you have environment specific properties (dev/test/prod)
then supply APP_ENV custom java environment variable in spark-submit
:
${SPARK_HOME}/bin/spark-submit --conf \
"spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" \
--properties-file dev.property
Replace your driver or executor code:
//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties")

- 5,614
- 10
- 57
- 91
-
-
In my case, I was using HDFS. You may use java io to read properties file as well. – Rahul Sharma Jun 01 '18 at 04:29