2

I have to do matrix multiplication in PySpark but can't find how to do it with DenseMatrix. For example

from pyspark.mllib.linalg import DenseMatrix

Q = DenseMatrix(nfeatures, nfeatures, [1, 0, 0, 0, 1, 0, 0, 0, 1])
w = DenseMatrix(nfeatures, 1, [0, 0, 0])
print( Q * w )

results in the following error:

TypeError: unsupported operand type(s) for *: 'DenseMatrix' and 'DenseMatrix'

What am I doing wrong? Is there a method for doing matrix multiplication? What is the usual way of doing this with PySpark streaming?

Best regards, Noelia

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
ermutarra
  • 112
  • 1
  • 1
  • 8

1 Answers1

2

Neither pyspark.ml.linalg.Matrix nor pyspark.mllib.linalg.Matrix implements matrix multiplication. These classes are used mostly as an exchange formats for mllib / ml algorithms and are not designed to be used as full featured data structures for linear algebra.

If you need something more than to pass data to some ML / MLlib algorithm just use standard NumPy / SciPy stack.

zero323
  • 322,348
  • 103
  • 959
  • 935
  • 1
    Thank you for your answer. I assume once we conver `toArray()` to use the standard numpy linalg functions, we are giving up on any potential parallelisation while multiplication? I saw your response [here](http://stackoverflow.com/questions/37766213/spark-matrix-multiplication-with-python), where you suggest using BlockMatrix. Can the poster not convert to BlockMatrix and do the multiplication in a parallelised way there? – Zhubarb Apr 10 '17 at 13:45
  • 1
    @Rhubarb `pyspark.{ml|mllib}.linalg.DenseMatrix` is not distributed at all. Parallelization will depend on the underlaying linear algebra library. Regarding conversion to `BlockMatrix` - as long as data fits in memory matrix multiplication with good native bindings will order(s) of magnitude faster. So conversion like this rarely makes sense. But technically speaking it is possible. – zero323 Apr 10 '17 at 14:49
  • Thanks, that is what I was thinking. As long as it fits into memory, no point in trying to 'parallelise' a vectorised operation. Right? – Zhubarb Apr 10 '17 at 14:52
  • @Rhubarb Yeah. This is typically true for any small structure. – zero323 Apr 10 '17 at 14:59