I'm trying to write some code to get a Mallet Instance List file into a document topics matrix in R. To do this, I read the instance list file into a topic trainer variable called 'topic.model'. Below is the function call I am making to create a document topics matrix in R:
theta <- mallet::mallet.doc.topics(topic.model, smoothed = TRUE, normalized = TRUE)
I got this working on a smaller instance list file (< 1gb), but for a larger instance list (~15gb) I receive the following error:
Error in .jcall(wrapper, "[D", "flat_double") :
java.lang.NegativeArraySizeException
Calls: myfunc ... .jevalArray -> newArray -> structure -> .jcall -> .jcheck
Execution halted
I suspect that the somewhere there is an integer overflow, in which INT_MAX
is exceeded, and the NegativeArraySizeException
occurs. Interestingly, using the command line, Mallet
was able to make the document topics file using the --output-doc-topics
parameter (>150gb). Any suggestions would be greatly appreciated.