I'm working on a auto summarization system and I want to evaluate my output summary with my gold summaries. I have multiple summaries with different length for each case. So I'm a little confused in here. my question is that how should I evaluate my summary with these gold summaries. should I evaluate mine with each gold summary then average the results or assume union of gold summaries as gold summary then evaluate mine with that?
Thank you in advance