I am attempting to model topcis using Mallet. I have repeatedly seen statements in blog posts and research papers recommending to limit the number of words per document - in most cases around 1000 words. The fact that LDA requires a minimum number of words is clear, of course. However, is it true that there is a technical reason to recommend splitting larger documents into smaller chunks? My documents range between 5k-20k words. Would I be better off splitting a 5k document into multiple documents?
Many thanks in advance!