3

I want to load a property config file when submit a spark job, so I can load the proper config due to different environment, such as a test environment or a product environment. But I don't know where to put the properties file, here is the code loading the properties file:

object HbaseRDD {

  val QUORUM_DEFAULT = "172.16.1.10,172.16.1.11,172.16.1.12"
  val TIMEOUT_DEFAULT = "120000"

  val config = Try {
    val prop = new Properties()
    prop.load(new FileInputStream("hbase.properties"))
    (
      prop.getProperty("hbase.zookeeper.quorum", QUORUM_DEFAULT),
      prop.getProperty("timeout", TIMEOUT_DEFAULT)
      )
  }

  def getHbaseRDD(tableName: String, appName:String = "test", master:String = "spark://node0:7077") = {
    val sparkConf = new SparkConf().setAppName(appName).setMaster(master)
    val sc = new SparkContext(sparkConf)
    val conf = HBaseConfiguration.create()

    config match {
      case Success((quorum, timeout)) =>
        conf.set("hbase.zookeeper.quorum", quorum)
        conf.set("timeout", timeout)
      case Failure(ex) =>
        ex.printStackTrace()
        conf.set("hbase.zookeepr.quorum", QUORUM_DEFAULT)
        conf.set("timeout", TIMEOUT_DEFAULT)
    }
    conf.set(TableInputFormat.INPUT_TABLE, tableName)
    val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
    hbaseRDD
  }

}

The question is where I put the hbase.properties file so that spark could find and loading it? Or how to specify it via spark-submit?

armnotstrong
  • 8,605
  • 16
  • 65
  • 130
  • so what arguments trick worked for you ? where you have placed properties file ? – Ram Ghadiyaram Sep 09 '16 at 08:40
  • Possible duplicate of [How to pass -D parameter or environment variable to Spark job?](https://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job) – Alex K Jun 07 '18 at 08:40

1 Answers1

8

Please follow this example (Spark 1.5) configuration :

  • Files can be placed under working directory from where you are submitting spark job.. (which we used)
  • Another approach is keeping under hdfs as well.

check Run-time Environment configurations These configuration options will change one version to another version, you can check corresponding runtume config documentation

spark-submit --verbose --class <your driver class > \
--master yarn-client \
--num-executors 12 \
--driver-memory 1G \
--executor-memory 2G \
--executor-cores 4 \
--conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+UseSerialGC -XX:+UseCompressedOops -XX:+UseCompressedStrings  -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:PermSize=256M -XX:MaxPermSize=512M" \
--conf "spark.driver.extraJavaOptions=-XX:PermSize=256M -XX:MaxPermSize=512M" \
--conf "spark.shuffle.memoryFraction=0.5" \
--conf "spark.worker.cleanup.enabled=true" \
--conf "spark.worker.cleanup.interval=3600" \
--conf "spark.shuffle.io.numConnectionsPerPeer=5" \
--conf "spark.eventlog.enabled=true" \
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
--conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*:$folder/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \

--conf "spark.executor.extraClassPath=$OTHER_JARS:hbase.Properties" \

--conf "spark.yarn.executor.memoryOverhead=2048" \
--conf "spark.yarn.driver.memoryOverhead=1024" \
--conf "spark.eventLog.overwrite=true" \
--conf "spark.shuffle.consolidateFiles=true" \
--conf "spark.akka.frameSize=1024" \

--properties-file yourconfig.conf \

--files hbase.properties \

--jars $your_JARS\

Also, have a look at

Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121