1

With pandas/numpy, a 2x2 matrix multiplied with a 2x1 matrix will result in each column in 2x2 matrix by corresponding column value in 2x1 matrix. Ex. The following with numpy

>>> data = np.array([[1, 2], [3, 4]])
>>> data
array([[1, 2],
       [3, 4]])
>>> data * [2, 4]
array([[ 2,  8],
       [ 6, 16]])

How can this operation be done with spark/breeze? I tried unsuccessfully with new DenseVector(2, 2, Array(1,2,3,4)) * DenseVector(2, 4).

zero323
  • 322,348
  • 103
  • 959
  • 935
chargercable
  • 11
  • 1
  • 3

2 Answers2

3

Spark DataFrames are not designed to linear algebra operations. Theoretically you can combine all columns using VectorAssembler and perform multiplications using ElementwiseProduct:

import org.apache.spark.ml.feature.ElementwiseProduct
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.ml.feature.VectorAssembler

val assembler = new VectorAssembler()
  .setInputCols(Array("x1", "x2"))
  .setOutputCol("xs")

val product = new ElementwiseProduct()
  .setScalingVec(Vectors.dense(Array(2.0, 4.0)))
  .setInputCol("xs")
  .setOutputCol("xs_transformed")

val df = sc.parallelize(Seq((1.0, 2.0), (3.0, 4.0))).toDF("x1", "x2")

product.transform(assembler.transform(df)).select("xs_transformed").show
// +--------------+
// |xs_transformed|
// +--------------+
// |     [2.0,8.0]|
// |    [6.0,16.0]|
// +--------------+

but it is useful only for basic transformations.

zero323
  • 322,348
  • 103
  • 959
  • 935
1

In Breeze, this is done with the special broadcasting value *.

scala> import breeze.linalg._
import breeze.linalg._

scala> val dm = DenseMatrix((1,2), (3,4))
dm: breeze.linalg.DenseMatrix[Int] =
1  2
3  4

scala> dm(*, ::) :* DenseVector(2,4)
res0: breeze.linalg.DenseMatrix[Int] =
2  8
6  16

dm(*, ::) says "apply the operation to every row". Scalar multiplication is :*, while matrix/shaped multiplication is *.

dlwh
  • 2,257
  • 11
  • 23
  • Great solution. It works with test case, though it doesn't seem safe when vector is longer than row. Here's numpy: `>>> data * [1,2,3] Traceback (most recent call last): File "", line 1, in ValueError: operands could not be broadcast together with shapes (2,2) (3,)`. – chargercable Jan 06 '16 at 21:44