The classifier frequently fails with OutOfMemoryError. Please suggest.
We have UIMA pipeline which invokes 5 model jars(based on mallet CRF) around 30MB each. -Xms is set to 2G and -Xmx is set to 4G.
Is there any guidelines/bench marking on setting the heap space? Please point if there are any guidelines for multi threaded environment.
I did try applying the patch https://code.google.com/p/cleartk/issues/detail?id=408, this did not resolve the issue.
Heap dump shows that 42% of heap size is char[] and 15% is String.
java.lang.OutOfMemoryError: Java heap space
at cc.mallet.types.IndexedSparseVector.setIndex2Location(IndexedSparseVector.java:109)
at cc.mallet.types.IndexedSparseVector.dotProduct(IndexedSparseVector.java:157)
at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1856)
at cc.mallet.fst.CRF$TransitionIterator.<init>(CRF.java:1835)
at cc.mallet.fst.CRF$State.transitionIterator(CRF.java:1776)
at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:252)
at cc.mallet.fst.MaxLatticeDefault.<init>(MaxLatticeDefault.java:197)
at cc.mallet.fst.MaxLatticeDefault$Factory.newMaxLattice(MaxLatticeDefault.java:494)
at cc.mallet.fst.MaxLatticeFactory.newMaxLattice(MaxLatticeFactory.java:11)
at cc.mallet.fst.Transducer.transduce(Transducer.java:124)
at org.cleartk.ml.mallet.MalletCrfStringOutcomeClassifier.classify(MalletCrfStringOutcomeClassifier.java:90)
Model is created based on MalletCrfStringOutcomeDataWriter.
AnalysisEngineFactory.createEngineDescription(DataChunkAnnotator.class,
CleartkSequenceAnnotator.PARAM_IS_TRAINING, true, DirectoryDataWriterFactory.PARAM_OUTPUT_DIRECTORY,
options.getModelsDirectory(), DefaultSequenceDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME, MalletCrfStringOutcomeDataWriter.class)
The annotator code looks as follows.
if (this.isTraining()) {
List<DataAnnotation> namedEntityMentions = JCasUtil.selectCovered(jCas, DataAannotation.class, sentence);
List<String> outcomes = this.chunking.createOutcomes(jCas, tokens, namedEntityMentions);
this.dataWriter.write(Instances.toInstances(outcomes, featureLists));
} else {
List<String> outcomes = this.classifier.classify(featureLists);
this.chunking.createChunks(jCas, tokens, outcomes);
}
Thanks