I'm a little confused after reading Bag of tricks for efficient text classification.
What is the difference between args wordNgrams
, minn
and maxn
For example, a text classification task and Glove embedding as pretrainedVectors
ft.train_supervised(file_path,lr=0.1,epoch=5,wordNgrams=2,dim=300,loss='softmax', minn=2,maxn=3,pretrainedVectors='glove.300d.txt',verbose=0)
an input sentence is 'I love you'.
Given minn=2,maxn=3, the whole sentence is transformed into [<I, I>], [<l, <lo, lo, lov,.....]
etc
For the word love, its fasttext embedding = (emb(love) (as a complete word) + emb(<l)+emb(<lo)+....) / n.
For the sentence, it is splitted into [I love, love you]
(because wordNgrams=2) and these 2-gram embeddings are [(fasttext emb(I)+fasttext emb(love))/2, (fasttext emb(love)+fasttext emb(you))/2]
.
The sentence embedding is average of 2-gram embeddings and has dimensionality as 300. Then it is fed through a layer which has #labels neurons (i.e. multiplied with a matrix whose size is [300, #labels]).
Is this right? Please correct me if I'm wrong