-1

I have a block matrix of size 4*4

Blockmatrix = 
    0.0  2.0  1.0  2.0
    2.0  0.0  2.0  4.0
    1.0  2.0  0.0  3.0
    2.0  4.0  3.0  0.0

While typing the type of this matrix,

It is showing

<class 'pyspark.mllib.linalg.distributed.BlockMatrix'>

I want to change only the diagonal element to 1.0.

When I tried this piece of code,

diagonal_matrix = DenseMatrix(dataframe_item.numRows,dataframe_item.numCols,dataframe_item.toLocalMatrix().rowIter().zipWithIndex().flatMap(lambda x:(x[1].toArray(x[2]),x[1].toArray())).toArray())

It is throwing the following error,

AttributeError: 'DenseMatrix' object has no attribute 'rowIter'

Can anyone help to solve this error? Or is there any better way to change diagonal values of BlockMatrix in Pyspark?

Mani Rz
  • 87
  • 4
  • 11

1 Answers1

0

Try this,

from pyspark.mllib.linalg.distributed import *

rdd = sc.parallelize([
    (0, 0.0 , 2.0 , 1.0,  2.0),
    (1, 2.0 , 0.0 , 2.0 , 4.0),
    (2, 1.0 , 2.0 , 0.0 , 3.0),
    (3, 2.0 , 4.0  ,3.0  ,0.0)
    ])

Blockmatrix = IndexedRowMatrix(rdd.map(lambda x: IndexedRow(x[0],x[1:]))).toBlockMatrix()

This gives us the Blockmatrix.

Now, to replace diagonal elements with 1.0, we use indices of rows. Hence, we convert the BlockMatrix to IndexedRowMatrix and then to rdd. Hence we have indices and rows.

We then change all diagonal elements to 1.0 using indices and then convert it back to IndexedRowMatrix and then to BlockMatrix.

Blockmatrix_new = IndexedRowMatrix(Blockmatrix.toIndexedRowMatrix().rows\
                 .map(lambda x: IndexedRow(x.index, [1.0 if i == x.index else v for i,v in enumerate(x.vector)])))\
                 .toBlockMatrix()

Blockmatrix_new is the desired matrix. To view the contents, we convert it to local matrix. But remember that this will result in the matrix being collected.

print Blockmatrix_new.toLocalMatrix()

DenseMatrix([[1., 2., 1., 2.],
             [2., 1., 2., 4.],
             [1., 2., 1., 3.],
             [2., 4., 3., 1.]])
mayank agrawal
  • 2,495
  • 2
  • 13
  • 32