trouble in adding spark-csv package in Cloudera VM

Question

I am using Cloudera quickstart VM to test out some pyspark work. For one task, I need to add spark-csv package. And here is what I did:

PYSPARK_DRIVER_PYTHON=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0

pyspark started up fine, however I did get warnings as:

**16/02/09 17:41:22 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/02/09 17:41:22 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/09 17:41:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable**

then I ran my code in pyspark:

yelp_df = sqlCtx.load( 
    source="com.databricks.spark.csv",  


    header = 'true',  


    inferSchema = 'true',  


    path = 'file:///directory/file.csv')

But I am getting an error message:

Py4JJavaError: An error occurred while calling o19.load.: java.lang.RuntimeException: Failed to load class for data source:    com.databricks.spark.csv at scala.sys.package$.error(package.scala:27)

What could have gone wrong?? Thanks in advance for your help.

score 0 · Answer 1 · answered Feb 25 '16 at 17:23

0

Try this

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0

Without the space, there's a typo.

answered Feb 25 '16 at 17:23

Carlos Delgado

552
7
23

trouble in adding spark-csv package in Cloudera VM

1 Answers1