For very large and sparse matrix, (like the one you get from text feature extraction), the best and easiest way is:
def transposeRowMatrix(m: RowMatrix): RowMatrix = {
val indexedRM = new IndexedRowMatrix(m.rows.zipWithIndex.map({
case (row, idx) => new IndexedRow(idx, row)}))
val transposed = indexedRM.toCoordinateMatrix().transpose.toIndexedRowMatrix()
new RowMatrix(transposed.rows
.map(idxRow => (idxRow.index, idxRow.vector))
.sortByKey().map(_._2))
}
For not so sparse matrix, you can use BlockMatrix as the bridge as mentioned by aletapool's answer above.
However aletapool's answer misses a very important point: When you start from RowMaxtrix -> IndexedRowMatrix -> BlockMatrix -> transpose -> BlockMatrix -> IndexedRowMatrix -> RowMatrix, in the last step (IndexedRowMatrix -> RowMatrix), you have to do a sort. Because by default, converting from IndexedRowMatrix to RowMatrix, the index is simply dropped and the order will be messed up.
val data = Array(
MllibVectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
MllibVectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
MllibVectors.dense(4.0, 0.0, 0.0, 6.0, 7.0),
MllibVectors.sparse(5, Seq((2, 2.0), (3, 7.0))))
val dataRDD = sc.parallelize(data, 4)
val testMat: RowMatrix = new RowMatrix(dataRDD)
testMat.rows.collect().map(_.toDense).foreach(println)
[0.0,1.0,0.0,7.0,0.0]
[2.0,0.0,3.0,4.0,5.0]
[4.0,0.0,0.0,6.0,7.0]
[0.0,0.0,2.0,7.0,0.0]
transposeRowMatrix(testMat).
rows.collect().map(_.toDense).foreach(println)
[0.0,2.0,4.0,0.0]
[1.0,0.0,0.0,0.0]
[0.0,3.0,0.0,2.0]
[7.0,4.0,6.0,7.0]
[0.0,5.0,7.0,0.0]