I am trying to understand the result evaluation table (table 1) of this paper.
There are three different accuracies reported overall
, unknown words (UW)
, known words (KW)
, and percentage of unknown words (% unk.)
.
Are the known words
the data that is used for training? And, are the unknown words
the data that is used for testing and validation?
What is the overall accuracy? How is it computed?
What is the percentage of unknown words % unk.
? Is it the percentage of the test set?
Thank you.