2

I am trying out succinctRDD for searching mechanism. Below is what I am trying as per the doc:

import edu.berkeley.cs.succinct.kv._
val data = sc.textFile("file:///home/aman/data/jsonDoc1.txt")
val succintdata = data.succinct.persist()

The link is here ...succint RDD

The error I am getting is below

<console>:32: error: value succinct is not a member of org.apache.spark.rdd.RDD[String]
         val succintdata = data.succinct.persist()  

if anybody can point out the problem here or any step I should follow before this.

This is basically sbt build .

name := "succinttest"

version := "1.0"

scalaVersion := "2.11.7"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.5.2"
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.8.2.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "1.5.2"
libraryDependencies += "amplab" % "succinct" % "0.1.7"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll ExclusionRule(organization = "javax.servlet")
Amaresh
  • 3,231
  • 7
  • 37
  • 60

1 Answers1

1

This is a typical implicit conversion problem in Scala.

When you import the library:

import edu.berkeley.cs.succinct.kv._

Then your are importing all the classes/methods from this package, and then all the implicits. So, if you check the package.object on the source: https://github.com/amplab/succinct/blob/master/spark/src/main/scala/edu/berkeley/cs/succinct/kv/package.scala

... then you will realize that you have the next implicit conversion:

implicit class SuccinctContext(sc: SparkContext) {
  def succinctKV[K: ClassTag](filePath: String, storageLevel: StorageLevel = StorageLevel.MEMORY_ONLY) 
  (implicit ordering: Ordering[K])
  : SuccinctKVRDD[K] = SuccinctKVRDD[K](sc, filePath, storageLevel)
}

Which means that you have a new method on SparkContext to create a new SuccinctKVRDD from a text file. So try the next code:

import edu.berkeley.cs.succinct.kv._
val data = sc.succinctKV("file:///home/aman/data/jsonDoc1.txt")

And then you will have a succint RDD to do all the operations that you need like search, filterByValue, etc: https://github.com/amplab/succinct/blob/master/spark/src/main/scala/edu/berkeley/cs/succinct/kv/SuccinctKVRDD.scala

Carlos Verdes
  • 3,037
  • 22
  • 20