I apply Spark's word2vec by using a dataframe. Here is my code:
val df2 = df.groupBy("LABEL").agg(collect_list("TERM").alias("TERM"))
val word2Vec = new Word2Vec()
.setInputCol("TERM")
.setOutputCol("result")
.setMinCount(0)
val model = word2Vec.fit(df2)
val result = model.transform(df2)
val synonyms = model.findSynonyms("4", 10)
//synonyms.foreach(println)
for((synonym, cosineSimilarity) <- synonyms) {
println(s"$synonym $cosineSimilarity")
}
When I use synonyms.foreach(println)
the code works, however, the returned results are not ordered based on their similarity scores. Instead I have tried the for loop seen at bottom of the code. When applying it the following error has been thrown:
Error:(52, 40) missing parameter type for expanded function
The argument types of an anonymous function must be fully known. (SLS 8.5)
Expected type was: ?
for((synonym, cosineSimilarity) <- synonyms) {
^
From other similar stackoverflow questions and the error, it seems the exact types of arguments are needed. In the for loop synonyms
is a dataframe and the returned values have types String and Double, respectively. So all my trials have failed. How can I remedy this?