Questions tagged [fasttext]

fastText is a library for efficient learning of word representations and sentence classification.

fastText is a library for efficient learning of word representations and sentence classification. See https://github.com/facebookresearch/fastText for more information.

465 questions
7
votes
3 answers

Use tf-idf with FastText vectors

I'm interested in using tf-idf with FastText library, but have found a logical way to handle the ngrams. I have used tf-idf with SpaCy vectors already for what I have found several examples like these ones:…
Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181
7
votes
2 answers

How to handle unbalanced label data using FastText?

In FastText, I have unbalanced labels. What is the best way to handle it?
Gil Lev
  • 121
  • 6
6
votes
1 answer

SPACY - Confusion about word vectors and tok2vec

it would be really helpful for me if you would help me understand some underlying concepts about Spacy. I understand some spacy models have some predefined static vectors, for example, for the Spanish models these are the vectors generated by…
BaldML
  • 133
  • 2
  • 6
6
votes
2 answers

Latest Pre-trained Multilingual Word Embedding

Are there any latest pre-trained multilingual word embeddings (multiple languages are jointly mapped to a same vector space)? I have looked at the following but they don't fit my needs: FastText / MUSE…
6
votes
1 answer

gensim - fasttext - Why `load_facebook_vectors` doesn't work?

I've tried to load pre-trained FastText vectors from fastext - wiki word vectors. My code is below, and it works well. from gensim.models import FastText model = FastText.load_fasttext_format('./wiki.en/wiki.en.bin') but, the warning message is a…
frhyme
  • 966
  • 1
  • 15
  • 24
6
votes
2 answers

How to save fasttext model in binary and text formats?

The documentation is a bit unclear how to save the fasttext model to disk - how do you specify a path in the argument, I tried doing so and it failed with an error Example in documentation >>> from gensim.test.utils import get_tmpfile >>> >>> fname…
erotavlas
  • 4,274
  • 4
  • 45
  • 104
6
votes
2 answers

Error when loading FastText's french pre-trained model with gensim

I am trying to use the FastText's french pre-trained binary model (downloaded from the official FastText's github page). I need the .bin model and not the .vec word-vectors so as to approximate misspelled and out-of-vocabulary words. However when I…
Clara-sininen
  • 191
  • 2
  • 9
6
votes
2 answers

fasttext cannot load training txt file

I am trying to train a fasttext classifier in windows using fasttext python package. I have a utf8 file with lines like __label__type1 sample sentence 1 __label__type2 sample sentence 2 __label__type1 sample sentence 3 When I…
tahsintahsin
  • 994
  • 8
  • 18
6
votes
1 answer

Process finished with exit code -1073740791 (0xC0000409) pycharm error

I am trying to use fastText with PyCharm. Whenever I run below code: import fastText model=fastText.train_unsupervised("data_parsed.txt") model.save_model("model") The process exits with this error: Process finished with exit code -1073740791…
user9857589
  • 61
  • 1
  • 1
  • 2
5
votes
1 answer

Unable to recreate Gensim docs for training FastText. TypeError: Either one of corpus_file or corpus_iterable value must be provided

I am trying to make my own Fasttext embeddings so I went to official Gensim documentation and implemented this exact code below with exact 4.0 version. from gensim.models import FastText from gensim.test.utils import common_texts model =…
Deshwal
  • 3,436
  • 4
  • 35
  • 94
5
votes
0 answers

Incorporate fasttext vectors in tf.keras embedding layer?

Fasttext could handle OOV easily, i.e., it could be assumed that emb = fasttext_model(raw_input) always holds. However, I am not sure how I could build this layer into tf.keras embedding. I couldn't simply load the matrix into Embedding because in…
Mr.cysl
  • 1,494
  • 6
  • 23
  • 37
5
votes
1 answer

Proper way to add new vectors for OOV words

I'm using some domain-specific language which have a lot of OOV words as well as some typos. I have noticed Spacy will just assign an all-zero vector for these OOV words, so I'm wondering what's the proper way to handle this. I appreciate…
BaldML
  • 133
  • 2
  • 6
5
votes
1 answer

How to export a fasttext model created by gensim, to a binary file?

I'm trying to export the fasttext model created by gensim to a binary file. But the docs are unclear about how to achieve this. What I've done so far: model.wv.save_word2vec_format('model.bin') But this does not seems like the best solution.…
Farhood ET
  • 1,432
  • 15
  • 32
5
votes
1 answer

How to get nearest neighbours in fasttext for unsupervised learning models (cbow, skipgram)?

The examples (related to word representations) on fasttext official web site (fasttext.cc) suggest that it is possible to calculate the nearest neighbors on vectors derived with cbow (or skip-gram model) (in short, on unsupervised learning models).…
5
votes
1 answer

How i can maintain a temporary dictionary in a pyspark application?

I want to use pretrained embedding model (fasttext) in a pyspark application. So if I broadcast the file (.bin), the following exception is thrown: Traceback (most recent call last): cPickle.PicklingError: Could not serialize broadcast:…
bib
  • 944
  • 3
  • 15
  • 32
1
2
3
30 31