I'm using Mallet 2.0.8 for topic modelling. The application loops over a set of documents and calculates a topic model for each of the documents (no information is to be shared among the passes or to be aggregated afterwards). Each pass constructs a fresh alphabet with new cc.mallet.types.Alphabet()
for the current document.
However, it looks like the memory consumed by an alphabet cannot be garbage collected after each pass because it is referenced in cc.mallet.types.Alphabet.deserializedEntries
. There seems to be a question on the mailing list regarding this issue, too.
As a workaround I'm using reflection and set the field cc.mallet.types.Alphabet.deserializedEntries
to a subclass of java.util.concurrent.ConcurrentHashMap<K, V>
just doing nothing when putIfAbsent(K, V)
is called.
My questions are now:
- Is this workaround safe for the described usecase?
- Is it intended to construct an alphabet in each pass or is there a better approach that would not result in a memory leak?