You should define the dictionary's key ('rouge1'
) if you want to access directly to the Score object.
So the scores.append(scorer.score(hyp,ref))
will change to scores.append(scorer.score(hyp,ref)['rouge1'])
.
The following code is a more general version to computing the ROUGE metric for each document and remembering the results separately in a single dictionary:
# importing the native rouge library
from rouge_score import rouge_scorer
# a list of the hypothesis documents
hyp = ['This is the first sample', 'This is another example']
# a list of the references documents
ref = ['This is the first sentence', 'It is one more sentence']
# make a RougeScorer object with rouge_types=['rouge1']
scorer = rouge_scorer.RougeScorer(['rouge1'])
# a dictionary that will contain the results
results = {'precision': [], 'recall': [], 'fmeasure': []}
# for each of the hypothesis and reference documents pair
for (h, r) in zip(hyp, ref):
# computing the ROUGE
score = scorer.score(h, r)
# separating the measurements
precision, recall, fmeasure = score['rouge1']
# add them to the proper list in the dictionary
results['precision'].append(precision)
results['recall'].append(recall)
results['fmeasure'].append(fmeasure)
The output will be like the following:
{'fmeasure': [0.8000000000000002, 0.22222222222222224],
'precision': [0.8, 0.2],
'recall': [0.8, 0.25]}
Furthermore, I will suggest the rouge library that is another implementation of the ROUGE paper. The results may be slightly different, but it will introduce some useful features, including the possibility of computing rouge metrics by passing the whole text documents in and computing the average results over all the documents.