Given a dataframe :
+---+----------+
|key| value|
+---+----------+
|foo| bar|
|bar| one, two|
+---+----------+
Then I'd like to use the value column as entry to FPGrowth which must look like RDD[Array[String]]
val transactions: RDD[Array[String]] = df.select("value").rdd.map(x => x.getList(0).toArray.map(_.toString))
import org.apache.spark.mllib.fpm.{FPGrowth, FPGrowthModel}
val fpg = new FPGrowth().setMinSupport(0.01)
val model = fpg.run(transactions)
I get exception :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 141.0 failed 1 times, most recent failure: Lost task 7.0 in stage 141.0 (TID 2232, localhost): java.lang.ClassCastException: java.lang.String cannot be cast to scala.collection.Seq
Any suggestion welcome !