How to evaluate auto summary generated with gold summaries with Rouge metric?

Question

I'm working on a auto summarization system and I want to evaluate my output summary with my gold summaries. I have multiple summaries with different length for each case. So I'm a little confused in here. my question is that how should I evaluate my summary with these gold summaries. should I evaluate mine with each gold summary then average the results or assume union of gold summaries as gold summary then evaluate mine with that?

Thank you in advance

You can use ROUGE toolkit. This is a java package for automatic summary evaluation. The length of your summary is not important. Rouge will evaluate your summary based on the rate of n-gram overlaps between your summary and gold summaries. — Mahsa, Dec 08 '18 at 10:56
thanks, but my question is that I have different gold summaries per each case. so I don't know that evaluate my summary with each gold summary separately and then average them or assume union of gold summaries as gold summary then evaluate my summary with that? — Ramin Fatourehchi, Dec 09 '18 at 14:00

score 0 · Answer 1 · answered Dec 11 '18 at 09:23

ROUGE measure compares your summary with all of the reference summaries.

For example, ROUGE-N is computed based on the sum of similar n-gram counts between your summary and each of the reference summaries divided by total number of n-grams occurred in all of the reference summaries.

This paper on ROUGE will help you.

How to evaluate auto summary generated with gold summaries with Rouge metric?

1 Answers1