Dynamically loading com.databricks:spark-csv spark package to my application

Question

I need to load com.csv spark packages dynamically to my application, using spark submit , it works

spark-submit --class "DataLoaderApp" --master yarn 
             --deploy-mode client 
             --packages com.databricks:spark-csv_2.11:1.4.0  target/scala-2.10/ra-etl_2.10-1.0.0.jar LOAD GGSN /data-sources/DXE_Ver/1_4/MTN_Abuja/GGSN/20160221/GGSN_0_20160221074731.dat

but when I use

spark-submit --class "DataLoaderApp" --master yarn 
             --deploy-mode client 
             target/scala-2.10/ra-etl_2.10-1.0.0.jar LOAD GGSN /data-sources/DXE_Ver/1_4/MTN_Abuja/GGSN/20160221/GGSN_0_20160221074731.dat

for below configuration it doesn't work...

val conf = new SparkConf()
                .setAppName("Data Loader Application")
                .set("spark.jar.packages","com.databricks:spark-csv_2.11:1.4.0")

Can you pls confirm that you are using Scala 2.10, while you link to ``spark-csv_2.11``? Also, which Spark version are you using? — desertnaut, Aug 16 '16 at 09:08

desertnaut · Answer 1 · 2020-04-23T15:23:31.923

Using spark-csv via SparkConf seems to be still an open issue. Nevertheless, if your purpose is to save the need for typing the --packages argument every time you call spark-submit, you can add the relevant dependencies in your spark-defaults.conf file (normally located in your $SPARK_HOME/conf directory) as follows:

Locate the paths of spark-csv_2.11-1.4.0.jar and its dependencies, commons-csv-1.1.jar and univocity-parsers-1.5.1.jar. These should be already present in your system if you have already used spark-csv once; in my case (user ctsats), these paths are:

/home/ctsats/.ivy2/cache/com.databricks/spark-csv_2.11/jars/spark-csv_2.11-1.4.0.jar
/home/ctsats/.ivy2/cache/org.apache.commons/commons-csv/jars/commons-csv-1.1.jar
/home/ctsats/.ivy2/cache/com.univocity/univocity-parsers/jars/univocity-parsers-1.5.1.jar

Open spark-defaults.conf file (in $SPARK_HOME/conf – create it if does not exist), and add the above paths with the argument spark.driver.extraClassPath, i.e. if your paths are like above, then add the following line:

spark.driver.extraClassPath /home/ctsats/.ivy2/cache/com.databricks/spark-csv_2.11/jars/spark-csv_2.11-1.4.0.jar:/home/ctsats/.ivy2/cache/org.apache.commons/commons-csv/jars/commons-csv-1.1.jar:/home/ctsats/.ivy2/cache/com.univocity/univocity-parsers/jars/univocity-parsers-1.5.1.jar

Now, the spark-csv package will be automatically loaded whenever you call spark-submit or spark-shell.

Dynamically loading com.databricks:spark-csv spark package to my application

1 Answers1