2

I need to load com.csv spark packages dynamically to my application, using spark submit , it works

spark-submit --class "DataLoaderApp" --master yarn 
             --deploy-mode client 
             --packages com.databricks:spark-csv_2.11:1.4.0  target/scala-2.10/ra-etl_2.10-1.0.0.jar LOAD GGSN /data-sources/DXE_Ver/1_4/MTN_Abuja/GGSN/20160221/GGSN_0_20160221074731.dat

but when I use

spark-submit --class "DataLoaderApp" --master yarn 
             --deploy-mode client 
             target/scala-2.10/ra-etl_2.10-1.0.0.jar LOAD GGSN /data-sources/DXE_Ver/1_4/MTN_Abuja/GGSN/20160221/GGSN_0_20160221074731.dat

for below configuration it doesn't work...

val conf = new SparkConf()
                .setAppName("Data Loader Application")
                .set("spark.jar.packages","com.databricks:spark-csv_2.11:1.4.0")
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mahdi
  • 787
  • 1
  • 8
  • 33

1 Answers1

0

Using spark-csv via SparkConf seems to be still an open issue. Nevertheless, if your purpose is to save the need for typing the --packages argument every time you call spark-submit, you can add the relevant dependencies in your spark-defaults.conf file (normally located in your $SPARK_HOME/conf directory) as follows:

  1. Locate the paths of spark-csv_2.11-1.4.0.jar and its dependencies, commons-csv-1.1.jar and univocity-parsers-1.5.1.jar. These should be already present in your system if you have already used spark-csv once; in my case (user ctsats), these paths are:

    /home/ctsats/.ivy2/cache/com.databricks/spark-csv_2.11/jars/spark-csv_2.11-1.4.0.jar
    /home/ctsats/.ivy2/cache/org.apache.commons/commons-csv/jars/commons-csv-1.1.jar
    /home/ctsats/.ivy2/cache/com.univocity/univocity-parsers/jars/univocity-parsers-1.5.1.jar
    
  2. Open spark-defaults.conf file (in $SPARK_HOME/conf – create it if does not exist), and add the above paths with the argument spark.driver.extraClassPath, i.e. if your paths are like above, then add the following line:

    spark.driver.extraClassPath /home/ctsats/.ivy2/cache/com.databricks/spark-csv_2.11/jars/spark-csv_2.11-1.4.0.jar:/home/ctsats/.ivy2/cache/org.apache.commons/commons-csv/jars/commons-csv-1.1.jar:/home/ctsats/.ivy2/cache/com.univocity/univocity-parsers/jars/univocity-parsers-1.5.1.jar
    

Now, the spark-csv package will be automatically loaded whenever you call spark-submit or spark-shell.

desertnaut
  • 57,590
  • 26
  • 140
  • 166