Let's say I have these two Numpy arrays:
A = np.arange(1024 ** 2, dtype=np.float64).reshape(1024, 1024)
B = np.arange(1024 ** 2, dtype=np.float64).reshape(1024, 1024)
and I perform the following on them:
np.sum(np.dot(A, B))
Now, I'd like to be able to essentially perform the same calculation with the same matrices using PySpark in order to achieve a distributed calculation with my Spark cluster.
Does anyone know or have a sample that does something along these lines in PySpark?
Thank you very much for any help!