I am having trained a word vector model and now I'd like to do some operations on those vectors.
Currently I try to figure out how to e.g. add up some vectors like below and then get some synonyms from the resulting vector. The problem is that model.findSynonyms(org.apache.spark.mllib.linalg.Vector, Int)
is making problems since I only get Array[Float]
from my model
. This is why I try to create a DenseVector
which itself now needs Array[Double]
and the chaos is perfect - but take a look yourself:
val model = Word2VecModel.load(sc, modelPath)
val headVector = model.getVectors.head
val synonyms = model.findSynonyms("computer", 10)
var aVec : Array[Float] = new Array(headVector._2.length)
for((synonym, cosineSimilarity) <- synonyms) {
println(s" $synonym $cosineSimilarity")
// Returns Array[Float] (unfortunately)
val bVec = model.getVectors.get(synonym).get
val zipped = aVec zip bVec
aVec = zipped.map(t => t._1 + t._2)
println(util.Arrays.toString(aVec))
}
println(util.Arrays.toString(aVec))
// aVec is Array[Float] instead of Array[Double]
// or even better: something that implements org.apache.spark.mllib.linalg.Vector
val v : DenseVector = new DenseVector(aVec)
model.findSynonyms(aVec, 10)
This does not compile since aVec is Array[Float]
instead of Array[Double]
. This is because I can only get Array[Float]
from my model
. So either I am doing it completely wrong or that's a major issue of the API..
Is there another way to do this or will I have to convert all my vectors between those two types? Why am I not getting something from my model
that implements org.apache.spark.mllib.linalg.Vector
?
I also try this as suggested here
val v1 = new org.apache.spark.mllib.linalg.DenseVector(aVec)
for((synonym, cosineSimilarity) <- synonyms) {
val v2 = new DenseVector(model.transform(synonym).toArray)
val s = Vectors.dense((v1 + v2).toArray)
}
but I get a compilation error that v1 + v2
results in Array[Char]
and not Array[Double]
..