I'm using official FastText python library (v0.9.2) for intents classification.
import fasttext
model = fasttext.train_supervised(input='./test.txt',
loss='softmax',
dim=200,
bucket=2000000,
epoch=25,
lr=1.0)
Where test.txt contains just one sample file like:
__label__greetings hi
and predict two utterances the results are:
print(model.words)
print('hi', model.predict('hi'))
print('bye', model.predict('bye'))
app_1 | ['hi']
app_1 | hi (('__label__greetings',), array([1.00001001]))
app_1 | bye ((), array([], dtype=float64))
This is my expected output, meanwhile if a set two samples for the same label:
__label__greetings hi
__label__greetings hello
The result for OOV is not correct.
app_1 | ['hi', '</s>', 'hello']
app_1 | hi (('__label__greetings',), array([1.00001001]))
app_1 | bye (('__label__greetings',), array([1.00001001]))
I understand that the problem is with </s>
token, maybe \n
in text file?, and when there isn't any word on vocabulary the text is replaced by </s>
. There are any train option or way to skip this behavior?
Thanks!