I am working on a NLP problem with gensim that requires the use of multilingual embeddings. I have the already pretrained and aligned .txt embeddings that FastText provides in their web. Sadly, they don't provide the full model, but these vectors are missing some important vocabulary on my problem and the per-character embedding ability of a FastText model here comes very handy for these cases.
My question:
- Is there a way to recreate the entire model so I can infer new vocabulary that is also in the vector space of aligned embeddings?
- If not, Is there still a way to obtain those terms in that aligned embedding space? Without having to retrain a new entire FastText and then align it the already pre-trained ones?