0

I'm working on a auto summarization system and I want to evaluate my output summary with my gold summaries. I have multiple summaries with different length for each case. So I'm a little confused in here. my question is that how should I evaluate my summary with these gold summaries. should I evaluate mine with each gold summary then average the results or assume union of gold summaries as gold summary then evaluate mine with that?

Thank you in advance

  • You can use ROUGE toolkit. This is a java package for automatic summary evaluation. The length of your summary is not important. Rouge will evaluate your summary based on the rate of n-gram overlaps between your summary and gold summaries. – Mahsa Dec 08 '18 at 10:56
  • thanks, but my question is that I have different gold summaries per each case. so I don't know that evaluate my summary with each gold summary separately and then average them or assume union of gold summaries as gold summary then evaluate my summary with that? – Ramin Fatourehchi Dec 09 '18 at 14:00

1 Answers1

0

ROUGE measure compares your summary with all of the reference summaries.

For example, ROUGE-N is computed based on the sum of similar n-gram counts between your summary and each of the reference summaries divided by total number of n-grams occurred in all of the reference summaries.

This paper on ROUGE will help you.

Mahsa
  • 581
  • 1
  • 9
  • 28