Can LDAvis analyse the results of vowpal_wabbit LDA?

Question

LDAvis provides a excellent way of visualsing and exploring topic models. LDAvis requires 5 parameters:

phi (matrix with dimensions number of terms times number of topics)
theta (matrix with dimensions number of documents times number of topics)
number of words per document (integer vector)
the vocabulary (character vector)
the word frequency in the whole corpus (integer vector)

The short version of my question is: after fitting a LDA model with vowpal wabbit, how do one derive phi and theta?

theta represents the mixture of topics per document, and must thus sum to 1 per document. phi represents the probability of a term given the topic, and must thus sum to 1 per topic.

After running LDA with vowpal wabbit (vw) some kind of weights are stored in a model. A human readable version of that model can be aquired by feeding a special file, with one document per term in the vocabulary while inactivating learning (by the -t parameter), e.g.

vw -t -i weights -d dictionary.vw --readable_model readable.model.txt

According to the documentation of vowpal wabbit, all columns expect the first one of readable.model.txt now "represent the per-word topic distributions."

You can also generate predictions with vw, i.e. for a collection of documents

vw -t -i weights -d some-documents.txt -p predictions.txt

Both predictions.txt and readable.model.txt has a dimension that reflects the number of inputs (rows) and number of topics (columns), and none of them are probability distributions, because they do not sum to 1 (neither per row, nor per column).

I understand that vw is not for the faint hearted and that some programming/scripting will be required on my part, but I'm sure there must be some way to derive theta and phi from some the output of vw. I've been stuck on this problem for days now, please give me some hints.

score 1 · Answer 1 · answered May 04 '21 at 16:58

I don't know how to directly use pyLDAvis with Vowpal Wabbit. However, as you are already using a python tool you could use the Gensim wrapper and pyLDAvis together.

The python wrapper for VowpalWabbit was offered in gensim (< 4.0.0). You can simply use Gensim as if you would have trained the model by Gensim itself after using vwmodel2ldamodel.

This workaround might be the easiest way if you are not familiar with the internals of Vowpal Wabbit (and LDA in general).

Can LDAvis analyse the results of vowpal_wabbit LDA?

1 Answers1