I'm using a NER system that gives as output a text file containing a list of named entities which are instances of the concept Speaker. I'm looking for a tool that can compute the system's precision, recall and F1 by taking as input this list and the gold standard where the instances are correctly annotated with tags <Speaker>
.
I have two txt files: Instances.txt and GoldStandard.txt. I need to compare the extracted instances with the gold standard in order to calculate these metrics. For example, according to the second file, the first three sentences in the first file are True Positive and the last sentence is False Positive.
instances.txt contains:
is sponsoring a lecture by <speaker> Antal Bejczy from
announces a talk by <speaker> Julia Hirschberg
His name is <speaker> Toshiaki Tsuboi He will
to produce a schedule by <speaker> 50% for problems
GoldStandard.txt contains:
METC is sponsoring a lecture by <speaker> Antal Bejczy from Stanford university
METC announces a talk by <speaker> Julia Hirschberg
The speaker is from USA His name is <speaker> Toshiaki Tsuboi He will
propose a solution to these problems
It led to produce a schedule by 50% for problems