Loading pre trained fasttext model

Question

I have a question about fasttext (https://fasttext.cc/). I want to download a pre-trained model and use it to retrieve the word vectors from text.

After downloading the pre-trained model (https://fasttext.cc/docs/en/english-vectors.html) I unzipped it and got a .vec file. How do I import this into fasttext?

I've tried to use the mentioned function as follows:

import fasttext
import io

def load_vectors(fname):
    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
    n, d = map(int, fin.readline().split())
    data = {}
    for line in fin:
        tokens = line.rstrip().split(' ')
        data[tokens[0]] = map(float, tokens[1:])
    return data

vectors = load_vectors('/Users/username/Downloads/wiki-news-300d-1M.vec')
model = fasttext.load_model(vectors)

However, I can't completely run this code because python crashes. How can I successfully load these pre-trained word vectors?

Thank you for your help.

Pleas edit your question to specify whether there is an error message. — ygorg, Apr 14 '21 at 12:23
How big is the vector file? How much RAM does your machine have? — dennlinger, Apr 14 '21 at 12:48

ygorg · Accepted Answer · 2021-09-21T12:56:36.567

7

FastText's advantage over word2vec or glove for example is that they use subword information to return vectors for OOV (out-of-vocabulary) words.

So they offer two types of pretrained models : .vec and .bin.

.vec is a dictionary Dict[word, vector], the word vectors are pre-computed for the words in the training vocabulary.

.bin is a binary fasttext model that can be loaded using fasttext.load_model('file.bin') and that can provide word vector for unseen words (OOV), be trained more, etc.

In your case you are loading a .vec file, so vectors is the "final form" of the data. fasttext.load_model expects a .bin file.

If you need more than a python dictionary you can use gensim.models.keyedvector (which handles any word vectors, such as word2vec, glove, etc...).

edited Sep 21 '21 at 12:56

answered Apr 14 '21 at 13:06

ygorg

750
3
11

1

Any idea how to load `.vec` file using fasttext module? – Swapnil Masurekar Jan 03 '22 at 10:07
2

@SwapnilMasurekar have you checked this function ? https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.load_word2vec_format – ygorg Jan 04 '22 at 17:50

score 0 · Answer 2 · answered May 28 '22 at 21:23

I use the following code to load the .vec file in Python 3, where PATH_TO_FASTTEXT is the path to the .vec file.

Most notably, the map needs to be explicitly cast to a list.


def LoadFastText():
    input_file = io.open(PATH_TO_FASTTEXT, 'r', encoding='utf-8', newline='\n', errors='ignore')
    no_of_words, vector_size = map(int, input_file.readline().split())
    word_to_vector: Dict[str, List[float]] = dict()
    for i, line in enumerate(input_file):
        tokens = line.rstrip().split(' ')
        word = tokens[0]
        vector = list(map(float, tokens[1:]))
        assert len(vector) == vector_size
        word_to_vector[word] = vector
    return word_to_vector

How do you build a model out of those vectors then? I tried to use `load_model` for that and pass into vectors as a parameter but getting the following error: ```TypeError: loadModel(): incompatible function arguments. The following argument types are supported: 1. (self: fasttext_pybind.fasttext, arg0: str) -> None``` — Deil, Jun 08 '23 at 19:56

Loading pre trained fasttext model

2 Answers2