2

I need to do some text preprocessing in spark 1.6. taking the answer from Simplest method for text lemmatization in Scala and Spark, it's required to import java.util.Properties. But by running abt compiling and assembling, I got the following error:

[warn] Class java.util.function.Function not found - continuing with a stub.
[warn] Class java.util.function.Function not found - continuing with a stub.
[warn] Class java.util.function.Function not found - continuing with a stub.
[error] Class java.util.function.Function not found - continuing with a stub.
[error] Class java.util.function.Function not found - continuing with a stub.
[warn] four warnings found
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 52 s, completed Feb 10, 2016 2:11:12 PM

The code is as follows:

 // ref https://stackoverflow.com/questions/30222559/simplest-methodfor-text-lemmatization-in-scala-and-spark?rq=1

 def plainTextToLemmas(text: String): Seq[String] = {

 import java.util.Properties

 import edu.stanford.nlp.ling.CoreAnnotations._
 import edu.stanford.nlp.pipeline._

 import scala.collection.JavaConversions._
 import scala.collection.mutable.ArrayBuffer
 //  val stopWords = Set("stopWord")

 val props = new Properties()
 props.put("annotators", "tokenize, ssplit, pos, lemma")
 val pipeline = new StanfordCoreNLP(props)
 val doc = new Annotation(text)
 pipeline.annotate(doc)
 val lemmas = new ArrayBuffer[String]()
 val sentences = doc.get(classOf[SentencesAnnotation])
 for (sentence <- sentences;
     token <- sentence.get(classOf[TokensAnnotation])) {
 val lemma = token.get(classOf[LemmaAnnotation])
 if (lemma.length > 2) {
    lemmas += lemma.toLowerCase
 }
}
   lemmas
}

My sbt file is as follows:

scalaVersion := "2.11.7"

crossScalaVersions := Seq("2.10.5", "2.11.0-M8")

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided",
  "org.apache.spark" % "spark-mllib_2.10" % "1.6.0" % "provided",
  "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided",
  "com.github.scopt" % "scopt_2.10" % "3.3.0",
 )
    libraryDependencies ++= Seq(
  "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2",
  "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models"
  //   "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models-chinese"
  //   "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models-german"
  //   "edu.stanford.nlp" % "stanford-corenlp" % "3.5.2" classifier "models-spanish"
  //"com.google.code.findbugs" % "jsr305" % "2.0.3"
)

taking the suggestion from the site, I changed java lib version from 1.7 to 1.8, the problem is still there.

enter image description here

Cœur
  • 37,241
  • 25
  • 195
  • 267
HappyCoding
  • 5,029
  • 7
  • 31
  • 51
  • strange, cannot recreate this, I guess you can use the back port by adding this to your sbt dependency and see what happens: `libraryDependencies += "net.sf.m-m-m" %"mmm-util-backport-java.util.function" %"1.0.1"` – GameOfThrows Feb 10 '16 at 09:45
  • 1
    thanks @GameOfThrows. I solved the problem by setting my java home linking to java 1.8. previously, java home was linked to java 1.7. though the project SDKs link to java 1.8, probably (based on my observation and deduction) sbt compile command runs with default java home setting. reasonable? – HappyCoding Feb 10 '16 at 12:09

1 Answers1

3

problem is solved by setting java home to java 8. Previously, I changed project SDK to java 8 while the java home is still 7, so it failed to work when sbt compiling.

HappyCoding
  • 5,029
  • 7
  • 31
  • 51