I would like to read from a huge csv file, assign every row to a vector via spliting values by ",". In the end I aim to have an RDD of Vectors which holds the values. However I get an error after Seq:
type mismatch; found : Unit required: org.apache.spark.mllib.linalg.Vector Error occurred in an application involving default arguments.
My code is like this so far:
val file = "/data.csv"
val data: RDD[Vector] =sc.parallelize(
Seq(
for(line <- Source.fromFile(file).getLines){
Vectors.dense(line.split (",").map (_.toDouble).distinct)
}
)
)