0

In order to use Spark's machine learning capabilities I converted my training data to Spark vectors (DenseVector or SparseVector). I have to do some arithmetic (addition, multiplication with scalar, dot product) on that data before I can feed it into Spark's fit function.

Spark's own vector classes don't seem to offer any arithmetic functions.

Spark allows converting its own vectors to breeze (scala numerical processing library) which has all the bells and whistles but it doesn't allow breeze vectors to be converted to Spark vectors.

Are there functions for doing arithmetic with Spark's vectors or is there an easy/efficient way to convert breeze vectors to Spark's vectors?


update:

There's also a vector implementation in org.apache.spark.util which does support arithmetic but which seems to be completely disconnected from the implementation in org.apache.spark.mllib.linalg which I'm interested in.

zero323
  • 322,348
  • 103
  • 959
  • 935
Jonathan
  • 358
  • 3
  • 14
  • Apparently, that feature you want is hidden, so the only option is to convert your vector to an `Array` and then into a `Breeze` vector. – Alberto Bonsanto Apr 19 '16 at 15:10
  • @Alberto Bonsanto It's the other direction (breeze to Spark) that Spark doesn't support but I guess the same answer could be valid there. Always converting to an `Array` in between seems inefficient though. Does someone know how to do this in Java? – Jonathan Apr 19 '16 at 15:27

0 Answers0