Not sure if this is the right forum but I was wondering if anyone understands how to interpret the width of the red vs. blue bars on the right-hand side of pyLDAvis plots when lambda = 0 (see http://www.kennyshirley.com/LDAvis/#topic=0&lambda=0.01&term= for demo, for exsample). I understand that when lambda = 1, the red bars represent the counts of the terms in a given topic, and the gray bars represent the counts of the same terms overall in the corpus. I don't understand what's displayed when lambda = 0 and why the bars don't seem to be ordered in any way anymore. Could you help?
3 Answers
In simple words:
Values of lambda that are very close to zero will show terms that are more specific for a chosen topic. Meaning that you will see terms that are "important" for that specific topic but not necessarily "important" for the whole corpus.
Values of lambda that are very close to one will show those terms that have the highest ratio between frequency of the terms for that specific topic and the overall frequency of the terms from the corpus.
More info about lambda and LDAvis you can find here: LDAvis: A method for visualizing and interpreting topics

- 555
- 1
- 8
- 16
Values of lambda that are very close to one will show those terms that have the highest ratio between frequency of the terms for that specific topic and the overall frequency of the terms from the corpus
I don't agree with Ledian K's sentence above. The paper he linked helped explain how to interpret the bars though.
Relevance in that paper is a weighted average of 2 terms. The single variable lambda controls the weighting. The 2 terms contains 2 ideas:
- pw: empirical distribution of a term in the corpus
- phikw: probability of term given a topic
The gray/blue (color depending on version) bar represents idea 1. The red bar represents idea 2. The 1st term in the weighted average is log(idea2)--> log(phikw) The 2nd term in the weighted average is log(idea2/idea1) --> log(phikw/pw)
Lambda = 1 means the relevance is defined entirely by the 1st term, which only contains idea2. Lambda = 0 means the relevance is defined entirely by the 2nd term which contains idea2 divided by idea 1 (the lift, also referenced in that paper as Taddy(2011))
Graphically Interpretations
lambda = 1 means the bars are sorted by red bar width, nevermind what the blue bars are, meaning the lift over marginal (sum across topics) term probabilities are ignored. You can also observe that the blue bar widths are the same no matter what topic circle you click, because they are a corpus-wide property, independent of topics.
lambda = 0 means the bars are sorted by the ratio of red bar coverage over blue bar (with a maximum ratio at 1 meaning red bar completely covers blue).
As lambda increases, you will see more blue bar not being covered by red.
I'm guessing the lengths of bars do not match the frequencies seen from Dictionary.cfs because of the smoothing mentioned in the paper. Also addressed here: https://pyldavis.readthedocs.io/en/latest/history.html
The downside is that the blue bar widths do not necessarily match the user-supplied term frequencies exactly – in fact, the new version of LDAvis ignores the user-supplied term frequencies entirely

- 137
- 2
- 7
Check out http://qpleple.com/word-relevance/
When lambda = 1, the relevance score = P(word | topic k), in other words their frequency count.

- 11
-
1While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Tiago Silva Sep 23 '19 at 10:40