6

I´m currently trying to evaluate my topic models with gensim topiccoherencemodel:

from gensim.models.coherencemodel import CoherenceModel
cm_u_mass = CoherenceModel(model = model1, corpus = corpus1, coherence = 'u_mass')
coherence_u_mass = cm_u_mass.get_coherence()

print('\nCoherence Score: ', coherence_u_mass)

The output is just negative values. Is this correct? Can anybody provide a formula or something how u_mass works?

Nils_Denter
  • 488
  • 1
  • 6
  • 18

2 Answers2

12

Having a quick look at the original article you can see that UMass coherence is calculated over the log of probabilities therefore it is negative.

About the formula you asked, it can be found as equation 4 in the same article.

I understand that as the value of UMass coherence approaches to 0 the topic coherence gets better.

Hope this helps.

  • 3
    Actually the original article is by David Mimno et al. [Optimizing Semantic Coherence in Topic Models](https://www.aclweb.org/anthology/D11-1024.pdf) – J.Schneider Mar 31 '21 at 17:02
0

The accepted answer is wrong. For UMass the coherence typically starts with its highest values (i.e., close to zero) and starts to decrease as the number of topics increases. You can see this trend in this article. Its overall trend is the opposite of what you see for c_v. In short, you look for a trade-off between the number of topics and the most negative UMass score.

Kambiz
  • 1,217
  • 9
  • 8