0

I new to fastText, and had read the tutorials: https://fasttext.cc/docs/en/supervised-tutorial.html.

I had download the sample data, and found that the label is string type.

$ head cooking.stackexchange.txt   
                                                           
__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?
__label__restaurant Michelin Three Star Restaurant; but if the chef is not there
__label__knife-skills __label__dicing Without knife skills, how can I quickly and accurately dice vegetables?
__label__storage-method __label__equipment __label__bread What's the purpose of a bread box?
__label__baking __label__food-safety __label__substitutions __label__peanuts how to seperate peanut oil from roasted peanuts at home?
__label__chocolate American equivalent for British chocolate terms
__label__baking __label__oven __label__convection Fan bake vs bake
__label__sauce __label__storage-lifetime __label__acidity __label__mayonnaise Regulation and balancing of readymade packed mayonnaise and other sauces

And the train and test code from the tutorial.

>>> model = fasttext.train_supervised(input="cooking.train", lr=1.0)
Read 0M words
Number of words:  9012
Number of labels: 734
Progress: 100.0%  words/sec/thread: 81469  lr: 0.000000  loss: 6.405640  eta: 0h0m

>>> model.test("cooking.valid")
(3000L, 0.563, 0.245)

My question is that why the label is not applied (say sklearn) LabelEncoder? I've run the example and it worked well. And I was confused.

[UPDATED] --------

IMO, the code would look like below

from sklearn import preprocessing

texts_train, labels_train = load_dataset()

label_encoder = preprocessing.LabelEncoder()
labels_train = label_encoder.fit_transform(labels_train)


with open('cooking.train.2', 'w') as f:
    for i in range(len(texts_train)):
        f.write('%s __label__%d\n' % (texts_train[i], labels_train[i]))

model = fasttext.train_supervised('cooking.train.2',lr=1.0)
Jack Tang
  • 59
  • 5

0 Answers0