4

I was using LDAvis package's createJSON() function while my topicmodel was for 2 topics and received this error

Error in stats::cmdscale(dist.mat, k = 2) : 'k' must be in {1, 2, .. n - 1} 

Then I tested with reproducible example given here, by putting K=2 and keeping everything same and bumped into this error again in createJSON().

Upon looking at the source code of createJSON() here, the issue is in function jsPCA(). In jsPCA(), while K=2, the dist.mat comes out to be a single value which throws an error in line

pca.fit <- stats::cmdscale(dist.mat, k = 2) 

Any advice how to get past this error?

anonR
  • 849
  • 7
  • 26

1 Answers1

7

Your problem comes up because of a division by zero issue with the jensenShannon function that sits inside of jsPCA. The entire jsPCA code looks like this:

 jsPCA <- function(phi){
    jensenShannon <- function(x, y) {
    m <- 0.5 * (x + y)
    0.5 * sum(x * log(x/m)) + 0.5 * sum(y * log(y/m))
    }
    dist.mat <- proxy::dist(x = phi, method = jensenShannon)
    pca.fit <- stats::cmdscale(dist.mat, k = 2)
    data.frame(x = pca.fit[, 1], y = pca.fit[, 2])  
    }

If m contains zeros, the result is NaN. The errors trickle through from there. So, you can prevent the error by specifying a dimensionality-reduction method that tolerates zeros. Indeed, the LDAvis documentation provides an option rooted in t-SNE:

 library("tsne")
 svd_tsne <- function(x) tsne(svd(x)$u)

Simply plug this function into mds.method, and you should be good to go:

 createJSON(phi, theta, doc.length, vocab, term.frequency,
            mds.method = svd_tsne,
            plot.opts = list(xlab="", ylab="")
            )
user2047457
  • 381
  • 4
  • 13