pyspark.mllib DenseMatrix multiplication

Question

I have to do matrix multiplication in PySpark but can't find how to do it with DenseMatrix. For example

from pyspark.mllib.linalg import DenseMatrix

Q = DenseMatrix(nfeatures, nfeatures, [1, 0, 0, 0, 1, 0, 0, 0, 1])
w = DenseMatrix(nfeatures, 1, [0, 0, 0])
print( Q * w )

results in the following error:

TypeError: unsupported operand type(s) for *: 'DenseMatrix' and 'DenseMatrix'

What am I doing wrong? Is there a method for doing matrix multiplication? What is the usual way of doing this with PySpark streaming?

Best regards, Noelia

score 2 · Accepted Answer · answered Jul 19 '16 at 15:11

2

Neither pyspark.ml.linalg.Matrix nor pyspark.mllib.linalg.Matrix implements matrix multiplication. These classes are used mostly as an exchange formats for mllib / ml algorithms and are not designed to be used as full featured data structures for linear algebra.

If you need something more than to pass data to some ML / MLlib algorithm just use standard NumPy / SciPy stack.

answered Jul 19 '16 at 15:11

zero323

322,348
103
959
935

1

Thank you for your answer. I assume once we conver `toArray()` to use the standard numpy linalg functions, we are giving up on any potential parallelisation while multiplication? I saw your response [here](http://stackoverflow.com/questions/37766213/spark-matrix-multiplication-with-python), where you suggest using BlockMatrix. Can the poster not convert to BlockMatrix and do the multiplication in a parallelised way there? – Zhubarb Apr 10 '17 at 13:45
1

@Rhubarb `pyspark.{ml|mllib}.linalg.DenseMatrix` is not distributed at all. Parallelization will depend on the underlaying linear algebra library. Regarding conversion to `BlockMatrix` - as long as data fits in memory matrix multiplication with good native bindings will order(s) of magnitude faster. So conversion like this rarely makes sense. But technically speaking it is possible. – zero323 Apr 10 '17 at 14:49
Thanks, that is what I was thinking. As long as it fits into memory, no point in trying to 'parallelise' a vectorised operation. Right? – Zhubarb Apr 10 '17 at 14:52
@Rhubarb Yeah. This is typically true for any small structure. – zero323 Apr 10 '17 at 14:59

pyspark.mllib DenseMatrix multiplication

1 Answers1