I'm currently preparing a NER Task with Flair and I'm looking for some information about metrics used for NER task.
What are the most used metrics and how to interpretate them ?
I'm currently preparing a NER Task with Flair and I'm looking for some information about metrics used for NER task.
What are the most used metrics and how to interpretate them ?
I would suggest checking sklearn-crfsuite's documentationif you want out of the box implementation.
For interpretation consider your NER system as a multiclass classification system.