I have followed this solution for one hot encoding. Now I want the last variable in my array (which is an array of integers) to change so that I get individual columns for each one hot-encoded variable.
My current RDD is:
scala> encode_cars
res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Array[Int])] = MapPartitionsRDD[17] at map at <console>:27
and I ideally I would want something like:
res2: org.apache.spark.rdd.RDD[(Double, Double, Double, Double, Int, Int, Int, Int, Int, Int, Int)] = MapPartitionsRDD[17] at map at <console>:27
I know that this could be done using a map
/ flatmap
but I am not sure how to do it.