0

I have download the precompile version of apache spark 1.6.0/1.6.1 and when I try to do

     scala> val data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

in my spark shell, I get

java.lang.ClassNotFoundException: Failed to load class for data source: libsvm

.

I do a stackover flow search and I see this link failed-to-load class libsvm which indicate that it should work with 1.6, but somehow it didn't work for me, what do I need to do to get this working?

Community
  • 1
  • 1
dbspace
  • 231
  • 1
  • 2
  • 10
  • similar to the other stack overflow link, using MLUtils is ok, for example "val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")" works – dbspace May 22 '16 at 21:15
  • Could it be a classpath issue? How did you start the spark shell? Do you have any other versions of Spark or Scala installed? Is the environment SPARK_HOME set to something (in that case clear it and try again)? Are you on a unix-like OS? – Jakob Odersky May 23 '16 at 18:41
  • I first uncompress the precompile version into a folder for example spark-1.6.1-bin-hadoop2.6, then I cd into that folder then I start the spark-shell by doing bin/spark-shell. I try this with 1.6.0 also and I run into the same problem. Also the problem happen both on MacOS and Linux. (RHEL 6.x) – dbspace May 25 '16 at 01:21

2 Answers2

0

I find out that I have SPARK_HOME env set to an older version, even though the spark-shell is run from the right location, but some how it is trying to use SPARK_HOME to load some library, once you unset the SPARK_HOME environment variable. it is working fine now

dbspace
  • 231
  • 1
  • 2
  • 10
0

Make sure that you have included mllib dependencies in your sbt/pom file.

libraryDependencies += "org.apache.spark" %% "spark-mllib" % sparkVersion
Arush Kharbanda
  • 141
  • 3
  • 11