Negative Values: Evaluate Gensim LDA with Topic Coherence

Question

I´m currently trying to evaluate my topic models with gensim topiccoherencemodel:

from gensim.models.coherencemodel import CoherenceModel
cm_u_mass = CoherenceModel(model = model1, corpus = corpus1, coherence = 'u_mass')
coherence_u_mass = cm_u_mass.get_coherence()

print('\nCoherence Score: ', coherence_u_mass)

The output is just negative values. Is this correct? Can anybody provide a formula or something how u_mass works?

score 12 · Accepted Answer · answered Jan 24 '19 at 20:29

12

Having a quick look at the original article you can see that UMass coherence is calculated over the log of probabilities therefore it is negative.

About the formula you asked, it can be found as equation 4 in the same article.

I understand that as the value of UMass coherence approaches to 0 the topic coherence gets better.

Hope this helps.

answered Jan 24 '19 at 20:29

Francisco Nicolai Manaut

136
2
4

3

Actually the original article is by David Mimno et al. [Optimizing Semantic Coherence in Topic Models](https://www.aclweb.org/anthology/D11-1024.pdf) – J.Schneider Mar 31 '21 at 17:02

score 0 · Answer 2 · answered Jan 10 '23 at 19:30

The accepted answer is wrong. For UMass the coherence typically starts with its highest values (i.e., close to zero) and starts to decrease as the number of topics increases. You can see this trend in this article. Its overall trend is the opposite of what you see for c_v. In short, you look for a trade-off between the number of topics and the most negative UMass score.

Negative Values: Evaluate Gensim LDA with Topic Coherence

2 Answers2

Linked