I am a newbie on both spark and python. I am now having multiple vectors(TypeA) in hands and trying to compute their dot products with anthoer single vector(Type B). To speed up the progress, I'd like to implement this function with python3.4 on a spark cluster in order to deploy dot product computation of each TypeA and TypeB on different nodes. I have such codes below:
import numpy as np
from pyspark import SparkContext
sc=SparkContext()
#Type A Vectors
a=list([[1,2,3],[4,5,6]])
#Type B Vector
b=list([7,8,9])
result=np.dot(sc.parallelize(a).collect(),b)
The code above does produce correct answer, but my question is if the way i am coding with fulfil my original expectation? if not, can anyone possibly show my the correct approach?
Great Thanks in advance!