I'm working with the Mallet library for a project in Java.
I have 15,000 documents with 400 tokens each. I tried using ParallelTopicModel
. But I would like to have a set of topics that contain both single tokens and sequences of tokens (e.g. "Java" as well as "Java Developer").
I am considering using LDA-HMM. What class of Mallet can I use?
Then I'll turn every topic into nodes of a Bayesian network, to receive as evidence a token or sequence of tokens, and make inferences. Which Java library can I use for that?
Thanks in advance. Francesco