Questions tagged [fasttext]

fastText is a library for efficient learning of word representations and sentence classification.

fastText is a library for efficient learning of word representations and sentence classification. See https://github.com/facebookresearch/fastText for more information.

465 questions
2
votes
1 answer

Gensim: Any chance to get word frequency in Word2Vec format?

I am doing my research with fasttext pre-trained model and I need word frequency to do further analysis. Does the .vec or .bin files provided on fasttext website contain the info of word frequency? if yes, how do I get? I am using…
qby pony
  • 33
  • 4
2
votes
2 answers

FastText .bin file cannot fit in memory, even though I have enough RAM

I'm trying to load one of the FastText pre-trained models that has a form of a .bin file. The size of .bin file is 2.8GB and I have 8GB RAM and 8GB swap file. Unfortunately, the model starts loading and it occupies almost 15GB and then it breaks…
Kioko Key
  • 119
  • 1
  • 10
2
votes
1 answer

Training a model from multiple corpus

Imagine I have a fasttext model that had been trained thanks to the Wikipedia articles (like explained on the official website). Would it be possible to train it again with another corpus (scientific documents) that could add new / more pertinent…
2
votes
1 answer

Fasttext how to load a .csv column into model.predict

I am new to python and NLP. I have followed this tutorial (https://fasttext.cc/docs/en/supervised-tutorial.html) to train my fasttxt supervised model in Python. I have a csv with Text column and I would like to predict labels to ever row from the…
2
votes
1 answer

Understanding wordNgram from fastText

I'm trying to understanding what is the -wordNgrams parameter in the fastText. Let's take the following text as an example: The quick brown fox jumps over the lazy dog Now we have the context windows size of 2 at the 'brown' word, then we would…
Kleyson Rios
  • 2,597
  • 5
  • 40
  • 65
2
votes
1 answer

Difference between max length of word ngrams and size of context window

In the description of the fasttext library for python https://github.com/facebookresearch/fastText/tree/master/python for training a supervised model there are different arguments, where among others are stated as: ws: size of the context…
Akim Tsvigun
  • 91
  • 1
  • 8
2
votes
1 answer

Are Principal Components of different word2vec models measuring the same thing?

All in all I need to run multiple word2vec over a period of time. For example I will be running word2vec once every month. To reduce computing workload I would like to run word2vec only on the data that was accumulated during the last month. My…
2
votes
1 answer

Gensim most_similar() with Fasttext word vectors return useless/meaningless words

I'm using Gensim with Fasttext Word vectors for return similar words. This is my code: import gensim model = gensim.models.KeyedVectors.load_word2vec_format('cc.it.300.vec') words = model.most_similar(positive=['sole'],topn=10) print(words) This…
user2797134
  • 73
  • 1
  • 7
2
votes
1 answer

What are the defaults for gensim's fasttext?

I cannot find anything about the default values about the parameters for gensim fasttext here Or are they the same as for the original Facebook fasttext implementation?
user9937436
2
votes
1 answer

fasttext keeps predicting one label

am trying to use fasttext to label some data [url]or[PN] just to test it after training on 6k of each label and upon predicting it keeps predicting [PN] training command fasttext supervised -input input.txt -output model -minn 0 -maxn 0 -epoch 100…
Exorcismus
  • 2,243
  • 1
  • 35
  • 68
2
votes
3 answers

Reading a large pre trained fastext word embedding file in python

I am doing sentiment analysis and I want to use pre-trained fasttext embeddings, however the file is very large(6.7 GB) and the program takes ages to compile. fasttext_dir = '/Fasttext' embeddings_index = {} f = open(os.path.join(fasttext_dir,…
BlueMango
  • 463
  • 7
  • 21
2
votes
1 answer

How to get list of context words in Gensim

How to get most frequent context words from pretrained fasttext model? For example: For word 'football' and corpus ["I like playing football with my friends"] Get list of context words: ['playing', 'with','my','like'] I try to use model_wiki =…
2
votes
1 answer

How do Facebook's fasttext library handle numerical data in input for word vectorization?

I am using Facebook's Fasttext for performing text classification. I wanted to know how fasttext library handle the numbers in a text string provided as input for word vectorization. Do fasttext typecast each number as a string before creating word…
DK818
  • 135
  • 6
2
votes
1 answer

How to prepare data for word2vec in gensim and fasttext?

I want to train word2vec and fasttext to get vectors for a specific dataset that I have. What should my model take as input? My file is like this: Customer_4: I want to book a ticket to New York. Agent_9: Okay, when do you want the tickets…
tstseby
  • 1,259
  • 3
  • 10
  • 20
2
votes
0 answers

gensim error : 'NoneType' object is not subscriptable during training in Fasttext

While implementing Fasttext in Python 3.7, I am facing an unexpected scenario related to Exception in thread, which leads to NoneType' object is not subscriptable The error (screenshot) of full stack trace is as follows: What exactly is this…
M S
  • 894
  • 1
  • 13
  • 41