2

I am new in Fasttext. And I already have a few questions about this library, they may seem obvious to someone, but I really want to get the right intuition. Your help will be much appreciated.

First of all, I'm talking about Text classification part of Fasttext. According to the tutorial which is provided here we are predicting different labels of a given text. Is it true that we actually assign to the given test text every label with a probability with which this text fit the label?

And the second question, can anyone clarify/explain me the meaning of P@1 (precision at 1) and R@1 (recall at 1), metrics which are used in Fasttext, in this context? I found one answer here . But this answer caused me even more questions:

  • In the response provided by the link - what is P@1 and R@1 then? According to the logic and explanation there, P@1 is a presicion with one result (in our context - label), out of which we may have 1 correct or 1 incorrect label, this means P@1 can only take values 0 or 1, right? And how are we getting a probability here? Should we just calculate the share of all 1's out of all text samples? If yes, what is R@1 then? How it is calculated in this case? And what is R@k generally in this context?

And what is P@1 and R@1 in the example provided by tutorial, there they calculated P@5 and R@5, right?

Thanks a lot in advance,

Dilshat
  • 1,088
  • 10
  • 12

2 Answers2

0

Yes, the different labels are assigned a probability. You can see the probability for each label by running the following command, where my_model.bin and data.test are replaced with the appropriate names and k is the number of labels in your data set: ./fasttext predict-prob my_model.bin data.test k

ceek
  • 1
0

Firstly, precision is a ratio of number of correctly predicted labels over number of labels predicted by the model, while the @1 refers to the epoch. By default fastText runs for 5 epochs. Secondly, Recall is a ratio of number of correctly predicted labels over number of actual labels from validation dataset.

e.g: Actual labels for a input in dataset: A, B, C, D, E

Predicted labels for the input from the model: A, B, C, G

Correctly predicted labels: A, B, C

Precision: 3 / 4 = 0.75

Recall: 3 / 5 = 0.6