I'm running textsum decoding on a small test set (5 examples), but both the reference and decode files are already thousands of lines long. Is there a reason decoding runs seemingly indefinitely? Is it processing the same set of examples repeatedly? Are later outputs supposed to be better than earlier ones?
Would love some intuition on this; I haven't been able to find a clear explanation.