I'm trying to convert a dataframe to a breeze dense matrix using scala. I couldn't find any built-in functions to do this, so here's what I'm doing.
import scala.util.Random
import breeze.linalg.DenseMatrix
val featuresDF = (1 to 10)
.map(_ => (
Random.nextDouble,Random.nextDouble,Random.nextDouble))
.toDF("F1", "F2", "F3")
var FeatureArray: Array[Array[Double]] = Array.empty
val features = featuresDF.columns
for(i <- features.indices){
FeatureArray = FeatureArray :+ featuresDF.select(features(i)).collect.map(_(0).toString).map(_.toDouble)
}
val desnseMat = DenseMatrix(FeatureArray: _*).t
This does work fine and I get what I want. However, this causes OOM exceptions in my environment. Is there a better way of doing this conversion. My ultimate goal is to calculate the eigen values and eigen vectors of the features using the dense matrix.
import breeze.stats.covmat
import breeze.linalg.eig
val covariance = covmat(desnseMat)
val eigen = eig(covariance)
So, it would be even better if there's a direct way to get the eigen values and eigen vectors from the dataframe. PCA in spark ml must be doing this calculation using the features column. Is there a way to access eigen values through PCA?