17

I'm trying to automatically include jars to my PySpark classpath. Right now I can type the following command and it works:

$ pyspark --jars /path/to/my.jar

I'd like to have that jar included by default so that I can only type pyspark and also use it in IPython Notebook.

I've read that I can include the argument by setting PYSPARK_SUBMIT_ARGS in env:

export PYSPARK_SUBMIT_ARGS="--jars /path/to/my.jar"

Unfortunately the above doesn't work. I get the runtime error Failed to load class for data source.

Running Spark 1.3.1.

Edit

My workaround when using IPython Notebook is the following:

$ IPYTHON_OPTS="notebook" pyspark --jars /path/to/my.jar
Kamil Sindi
  • 21,782
  • 19
  • 96
  • 120
  • `Error in pyspark startup: IPYTHON and IPYTHON_OPTS are removed in Spark 2.0+. Remove these from the environment and set PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS instead.` – 123 Oct 10 '22 at 22:56

3 Answers3

17

You can add the jar files in the spark-defaults.conf file (located in the conf folder of your spark installation). If there is more than one entry in the jars list, use : as separator.

spark.driver.extraClassPath /path/to/my.jar

This property is documented in https://spark.apache.org/docs/1.3.1/configuration.html#runtime-environment

Diego Rodríguez
  • 815
  • 8
  • 13
  • 4
    I get an error: `Py4JJavaError: An error occurred while calling o28.load. : java.sql.SQLException: No suitable driver at java.sql.DriverManager.getDriver(DriverManager.java:315)` – FullStack Sep 04 '16 at 14:12
  • 1
    @FullStack me too, have you found a solution? – thebeancounter Aug 27 '17 at 14:40
9

As far as I know, you have to import jars to both driver AND executor. So, you need to edit conf/spark-defaults.conf adding both lines below.

spark.driver.extraClassPath /path/to/my.jar
spark.executor.extraClassPath /path/to/my.jar

When I went through this, I did not need any other parameters. I guess you will not need them too.

paulochf
  • 690
  • 2
  • 11
  • 21
  • `spark.executor.extraClassPath` is only for backwards-compatibility (not sure at which point it changed) but in the current docs it states you don't need it unless you're running an older version: https://spark.apache.org/docs/latest/configuration.html#runtime-environment – Mark J Miller Feb 10 '17 at 01:07
0

Recommended way since Spark 2.0+ is to use spark.driver.extraLibraryPath and spark.executor.extraLibraryPath

https://spark.apache.org/docs/2.4.3/configuration.html#runtime-environment

ps. spark.driver.extraClassPath and spark.executor.extraClassPath are still there, but deprecated and will be removed in a future release of Spark.

Tagar
  • 13,911
  • 6
  • 95
  • 110