Sorry I'm quite the beginner in the field of NLP, as the title says what is the best interval for optimization in Mallet API? I was also wondering if it was dependent or related to the number of iterations/topics/corpus etc.
-
Is this for the `--optimize-interval` option in training topic models? – David Mimno Nov 16 '17 at 15:02
-
@DavidMimno yes, for the API the code for this is .setOptimizeInterval(num); – CaffeineMakesMeSleepyHelp Nov 17 '17 at 03:19
2 Answers
The optimization interval is the number of iterations between hyperparameter updates. Values between 20 and 50 seem to work well, but I haven't done any systematic tests. One possible failure mode to look out for is that too many optimization rounds could lead to instability, with the alpha hyperparameters going to zero.

- 1,836
- 7
- 7
Here is an interesting blog post where Christof Schöch did some systematic tests on
Topic Modeling with MALLET: Hyperparameter Optimization
TL;DR:
It all depends on the project’s aims. But it is important that we are aware of the massive effects Mallet’s inconspicuous parameter of the hyperparameter optimization can have on the resulting models.
EDIT: The authors did not fix the random seed. So results might be explained by random initialization of MALLET.

- 384
- 3
- 12