Questions tagged [fasttext]

fastText is a library for efficient learning of word representations and sentence classification.

fastText is a library for efficient learning of word representations and sentence classification. See https://github.com/facebookresearch/fastText for more information.

465 questions
3
votes
1 answer

What is the difference between args wordNgrams, minn and maxn in fassttext supervised learning?

I'm a little confused after reading Bag of tricks for efficient text classification. What is the difference between args wordNgrams, minn and maxn For example, a text classification task and Glove embedding as…
3
votes
0 answers

Full FastText model from KeyedVectors to infer new words in aligned space

I am working on a NLP problem with gensim that requires the use of multilingual embeddings. I have the already pretrained and aligned .txt embeddings that FastText provides in their web. Sadly, they don't provide the full model, but these vectors…
Ed.
  • 846
  • 6
  • 24
3
votes
2 answers

Sentiment analysis of Italian sentences

If you have any experience on sentiment analysis, could you please tell me how I can analyse these sentences, which tool, library, module should I need? I nostri test di laboratorio ti permettono di confrontare le migliori marche di Condizionatori…
user12907213
3
votes
1 answer

Using fasttext pre-trained models as an Embedding layer in Keras

My goal is to create text generator which is going to generate non-english text based on learning set I provide to it. I'm currently at the stage of figuring out how the model actually should looks like. I'm trying to implement fasttext pre-trained…
pawcio
  • 71
  • 1
  • 5
3
votes
1 answer

How to use GridSearchCV (python) for maximizing or minimizing a function with parameters?

I would like to maximize a function: func(minCount, wordNgrams, lr, epoch, loss) with GridSearch on only these values: `{'minCount': [2, 3], 'wordNgrams': [1, 2, 3, 4, 5], 'lr': [0.1, 0.01, 0.001, 0.0001], 'epoch': [5, 10, 15, 20, 25, 30], 'loss':…
3
votes
1 answer

Speed Up Gensim's Word2vec for a Massive Dataset

I'm trying to build a Word2vec (or FastText) model using Gensim on a massive dataset which is composed of 1000 files, each contains ~210,000 sentences, and each sentence contains ~1000 words. The training was made on a 185gb RAM, 36-core machine. I…
Kamaney
  • 75
  • 2
  • 7
3
votes
1 answer

FastTextKeyedVectors difference between vectors, vectors_vocab and vectors_ngrams instance variables

I downloaded wiki-news-300d-1M-subword.bin.zip and loaded it as follows: import gensim print(gensim.__version__) model = gensim.models.fasttext.load_facebook_model('./wiki-news-300d-1M-subword.bin') print(type(model)) model_keyedvectors =…
abhinavkulkarni
  • 2,284
  • 4
  • 36
  • 54
3
votes
1 answer

Memory efficiently loading of pretrained word embeddings from fasttext library with gensim

I would like to load pretrained multilingual word embeddings from the fasttext library with gensim; here the link to the embeddings: https://fasttext.cc/docs/en/crawl-vectors.html In particular, I would like to load the following word embeddings:…
lux7
  • 1,600
  • 2
  • 18
  • 34
3
votes
1 answer

fasttext pre trained sentences similarity

I want to use fasttext pre-trained models to compute similarity a sentence between a set of sentences. can anyone help me? what is the best approach? I computed the similarity between sentences by train a tfidf model. write code like this. is it…
3
votes
1 answer

Is it possible to fine tune FastText models

I'm working on a project for text similarity using FastText, the basic example I have found to train a model is: from gensim.models import FastText model = FastText(tokens, size=100, window=3, min_count=1, iter=10, sorted_vocab=1) As I understand…
Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181
3
votes
0 answers

Using subword information in OOV token from fasttext in word embedding layer (keras/tensorflow)

I have my own Fasttext model and trained with it a keras classification model with a word embedding layer. But, I wonder how I can make use of the subword information of my model for OOV words? Since the word embedding layer operated via indices to…
ctiid
  • 335
  • 1
  • 3
  • 14
3
votes
2 answers

How to disable subwords embedding training when using fasttext?

Here is a snippet of the corpus I try to use for training word embedding. news_subent_12402 news_dlsub_00322 news_dlsub_00001 news_sub_00035 news_subent_07737 news_sub_00038 news_dlsub_00925 news_subent_07934 news_sub_00057 news_dlsub_01826…
yanachen
  • 3,401
  • 8
  • 32
  • 64
3
votes
2 answers

Implementing Word to vector model using Gensim

We are trying to implement a word vector model for the set of words given below. stemmed = ['data', 'appli', 'scientist', 'mgr', 'microsoft', 'hire', 'develop', 'mentor', 'team', 'data', 'scientist', 'defin', 'data', 'scienc', 'prioriti', 'deep',…
3
votes
0 answers

Emscripten: how to build a C++ project with headers

I want to convert this C++ project (Facebook FastText) ├── args.cc ├── args.h ├── dictionary.cc ├── dictionary.h ├── fasttext.cc ├── fasttext.h ├── main.cc ├── matrix.cc ├── matrix.h ├── model.cc ├── model.h ├── productquantizer.cc ├──…
loretoparisi
  • 15,724
  • 11
  • 102
  • 146
3
votes
1 answer

can I tokenize using spacy and then extract vectors for these token using pre trained word embeddings of fastext

I am tokenizing my text corpus which is in german language using the spacy's german model. Since currently, spacy only has small german model, I am unable to extract the word vectors using spacy itself. So, I am using fasttext's pre-trained word…
shasvat desai
  • 419
  • 3
  • 11