2

am trying to use fasttext to label some data [url]or[PN] just to test it after training on 6k of each label and upon predicting it keeps predicting [PN]

training command

fasttext supervised -input input.txt -output model -minn 0 -maxn 0 -epoch 100 -lr 0.1

sample training data

__label__PN 5962-8904XA
__label__PN 585DD4P54ZP
__label__PN GQ0B11400FCT
__label__URL http://ws.com/qd/lat/ispls32883.pdf
__label__URL http://ws.com/pdfs//2004/0423/ds/m412b.pdf
__label__URL http://ws.com/pdfs//2004/0423/mc68.pdf

sample test data

945
74ACT399MTC
http://www.msn.com/mylink.pdf
MQ8797BH
74AC1153
ICL762PA+
54LS3482A
54LS76A/B
54HC27/A
www.google.com
Exorcismus
  • 2,243
  • 1
  • 35
  • 68

1 Answers1

1

FastText it's based in the WordNGrams, it means that you need to be a complete sentence as input for the algorithm.

In your example, you're passing only a unigram for the algorithm, and depending on the number of WordNGrams that you're using in the parameters you model is not able to learn.

ELI5: The algorithm it's saying: I'm able to learn complex sentences because the structure of the words and their combination, but you're sending to me only words. I cannot handle that.

Flavio
  • 759
  • 1
  • 11
  • 24