MALLET generates a tab-separated file with the topic distribution of each document by using the --output-doc-topics
parameter while training the topic model. It kind of looks like this:
doc# filename topic# weight
0 file:/.../document_01.txt 3 0.2110215053763441 14 0.1330645161 ...
However, I need this file differently sorted for further processing. Right now the columns are sorted by descending topic weights (0.211..., 0.133... etc.). But is it also possible to sort it by ascending topic numbers (0, 1, 2, ...) and their corresponding weights?
Initially, I thought the sorting could be done with Excel, but the file is just too large (> 20 GB).
Is there maybe a MALLET parameter for this? I have already looked through the --help
section, but did not find anything relevant.
Otherwise, could you recommend a tool or API, which is capable of this kind of sorting?
Thank you!