2

i try do cluster words into the same category. Therfore i wanted to use Spacy Word2Vec. Its already working with easy words like banana apple and car. It shows the nearly same word.

If the words gets more specific like car, battery, accumulator, accu and so on, if the were more technical, Spacy sends Zero vectos. So these words were not included into the bibliothek.

Do you have some input for me?

Furthermore, i have to do it in german.

Thank you very much Jokulema

Jokulema
  • 21
  • 3

2 Answers2

1

The documentation says that Word2Vec needs a model in order to work with variety of words.

They also give an example of a model which includes ~ 1 milion words and show how to download it:

python -m spacy download en_core_web_lg

Please read the documentation here: https://spacy.io/usage/spacy-101#vectors-similarity

V. Sambor
  • 12,361
  • 6
  • 46
  • 65
  • Yes I downloaded the model (the German one) but the technical words are not included. So it gives back an empty vector. – Jokulema Jan 26 '20 at 08:25
0

If you need word-vectors for words not in the model you're using, you'll have to either:

  • find & use a different model that contains those words

  • train your own model from your own training data, that contains many examples of those words' usages in context

gojomo
  • 52,260
  • 14
  • 86
  • 115