3

In the code below, I get a dense Matrix V after doing SVD. What I want is

  1. Given a set of values(say 3,7,9).
  2. I want to extract the 3,7 and 9th row of Matrix V.
  3. I want to calculate cosine similarity of these 3 rows with each row of Matrix V
  4. I need to add the three cosine similarities obtained for of each row.
  5. I finally need the index of row which have the maximum summation.
val data = Array(
      Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
      Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
      Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))

val dataRDD = sc.parallelize(data)

val mat: RowMatrix = new RowMatrix(dataRDD)

// Compute the top 4 singular values and corresponding singular vectors.

val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(4, computeU = true)

val U: RowMatrix = svd.U  // The U factor is a RowMatrix.

val s: Vector = svd.s  // The singular values are stored in a local dense vector.

val V: Matrix = svd.V  // The V factor is a local dense matrix.

Please advise an efficient method to do the same. I have been thinking of converting Matrix V to Indexed Row Matrix, But when I do use row iterator on V, How do I keep track of index of rows? Is there a better way to do it?

ayush gupta
  • 607
  • 1
  • 6
  • 14
  • [this](https://stackoverflow.com/questions/40681794/spark-scala-how-to-group-dataframe-rows-and-apply-complex-function-to-the-grou) can be a good help for you – Ramesh Maharjan Jun 12 '17 at 04:48

0 Answers0