0

I have two different word embedding pretrained models that I want to combine together so that a missing word from one model can be complimented by the other model (in case the other model has the word that is missing in the first model). But the vectors are of different dimensions in the models. The first model vectors are of 300 dimensions and the second model vectors are of 1000 dimensions.

Can I simply retain the first 300 dimensions and discard the rest (700) in the second model and build one combined model of 300 dimensions?

Golam Kawsar
  • 760
  • 1
  • 8
  • 21

1 Answers1

2

Since the two models have been trained at separate times they will not "semantically align", even if they would have the same dimensionality. As there are some random aspects in the initialisation of the training one can't directly compare two independent vectors sets. The topological aspects, i.e. the relations between the vectors in high-dimensional space, will most likely be the same, but two vectors from two independent vector sets corresponding to the same word will not lie in the same position.

There are dimensionality reduction algorithms that can reduce the dimensionality from 1000 to 300 (SVD, PCA, SOM, autoencoders), but as I mentioned this won't solve your problem.

I would suggest to retrain a model based on corpora containing the full vocabulary, if possible. Even if there is some fancy way of combining to independent models I would assume that what you get will suffer in quality.

perfall
  • 118
  • 1
  • 9
  • If there are a lot of overlapping words in both models, there *is* a way to learn a transformation that could project the extra words into the other model – see this answer: https://stackoverflow.com/questions/47507091/creating-a-wordvector-model-combining-words-from-other-models/47515042#47515042 – But, not sure that code currently works with simultaneous dimensionality-reduction, and it is still likely best to create a combined corpus, with varied examples of all words of interest, then train a fresh model where the word-vectors are assured, by the co-training, of being comparable. – gojomo Dec 13 '17 at 18:11