Why can I only retrieve Array[Float] word vectors but have to pass mllib.linalg.Vector to w2v model?

Question

I am having trained a word vector model and now I'd like to do some operations on those vectors.

Currently I try to figure out how to e.g. add up some vectors like below and then get some synonyms from the resulting vector. The problem is that model.findSynonyms(org.apache.spark.mllib.linalg.Vector, Int) is making problems since I only get Array[Float] from my model. This is why I try to create a DenseVector which itself now needs Array[Double] and the chaos is perfect - but take a look yourself:

val model = Word2VecModel.load(sc, modelPath)
val headVector = model.getVectors.head
val synonyms = model.findSynonyms("computer", 10)
var aVec : Array[Float] = new Array(headVector._2.length)

for((synonym, cosineSimilarity) <- synonyms) {

  println(s"  $synonym $cosineSimilarity")

  // Returns Array[Float] (unfortunately)
  val bVec = model.getVectors.get(synonym).get
  val zipped = aVec zip bVec

  aVec = zipped.map(t => t._1 + t._2)
  println(util.Arrays.toString(aVec))
}

println(util.Arrays.toString(aVec))

// aVec is Array[Float] instead of Array[Double]
// or even better: something that implements org.apache.spark.mllib.linalg.Vector
val v : DenseVector = new DenseVector(aVec)

model.findSynonyms(aVec, 10)

This does not compile since aVec is Array[Float] instead of Array[Double]. This is because I can only get Array[Float] from my model. So either I am doing it completely wrong or that's a major issue of the API..

Is there another way to do this or will I have to convert all my vectors between those two types? Why am I not getting something from my model that implements org.apache.spark.mllib.linalg.Vector?

I also try this as suggested here

val v1 = new org.apache.spark.mllib.linalg.DenseVector(aVec)

for((synonym, cosineSimilarity) <- synonyms) {

  val v2 = new DenseVector(model.transform(synonym).toArray)

  val s = Vectors.dense((v1 + v2).toArray)
}

but I get a compilation error that v1 + v2 results in Array[Char] and not Array[Double]..

score 0 · Answer 1 · answered Apr 12 '16 at 22:57

Your second example will not work because your synonym is a String - which gets converted to an Array of Characters:

model.transform(synonym).toArray

So the v1 + v2 performs an addition of a float to a character array- which is a Array[Char] as you discovered.

As for the original question: yes you will need to convert from Array[Float] to Array[Double]

Why can I only retrieve Array[Float] word vectors but have to pass mllib.linalg.Vector to w2v model?

1 Answers1

Linked