0

Is it possible to convert a breeze dense matrix to numpy array using spark?

I have here a breeze dense matrix I want to convert to numpy array.

enter image description here

Alger Remirata
  • 529
  • 1
  • 5
  • 17
  • I am working on apache toree which is a scala-spark environment but it's fine to have a direct conversion from breeze to numpy without using spark. – Alger Remirata Dec 12 '16 at 15:42
  • can you please share code that illustrates your question, instead of copy pasting an image? – mtoto Dec 12 '16 at 15:43
  • basically my question does not depend on the code. It is a general question on ways to convert breeze dense matrices to numpy arrays. I just added the picture because stackoverflow got an error for lask of details if I'm just going to write this question: Is it possible to convert a breeze dense matrix to numpy array using spark? – Alger Remirata Dec 12 '16 at 15:45
  • no it is not, one is a scala object, the other is python. – mtoto Dec 12 '16 at 15:48

1 Answers1

0

Here is a way that works correctly but is slow / inefficient (creates multiple copies). i used zeppelin spark and pyspark interpreters (i guess toree should also be possible):

in spark:

%spark
import breeze.linalg._
import breeze.numerics._
z.put("matrix", DenseMatrix.eye[Double](4));
z.get("matrix")

scala output

then in python:

%pyspark
import numpy as np
def breeze2numpy(breeze_matrix):
    data = list(breeze_matrix.copy().data())
    return np.array(data).reshape(breeze_matrix.rows(), breeze_matrix.cols(), order='F')
breeze2numpy(z.z.get("matrix"))

python output

this works but will be impractical for big datasets (because of the multiple copies involved via a python list). it would be nice to have a zero-copy method using python's buffer protocol like there is for C++ Eigen matrix --> numpy array.