3

I'm trying to implement the pLSA algorithm proposed by Thomas Hoffman (1999). However, all the implementations I have found consider the input term-doc matrix as complete instead of sparse. Since my input matrix is quite large and sparse, I would like to find out an algorithm which supports the sparsity. Could you help me find one? Matlab or Java is preferred.

UPDATE I have found out that the PennAspect http://www.cis.upenn.edu/~ungar/Datamining/software_dist/PennAspect/index.html in fact implement PLSA with sparse matrix input.

The solution is simple. A 2D ragged array(an array which does not have the same length for each row) can be used to represent the sparse matrix.

Jia
  • 1,301
  • 1
  • 12
  • 18

1 Answers1

0

I know its too late. But I was also searching for an answer, and finally implemented on my own. I am new to R but loved this algorithm and was advised to implement this in R. It is working perfectly with my large sparse dtm ie document term matrix with 10 iterations :

##PLSA algo
k <- 100;
P1<-t(apply(matrix(sample.int(46, k*dim(mat)[2], TRUE), k, dim(mat[2]),1,funnorm <- function(matrow){
matcol <- matrow/sum(matrow)
return(matcol)
}))

P2<-t(apply(matrix(sample.int(46, dim(mat)[1]*k, TRUE), dim(mat)[1],  k),1,funnorm <- function(matrow){
matcol <- matrow/sum(matrow)
return(matcol)
}))

for(n in 1:10){

P3<-P2 %*% P1
P4 <- mat / P3

P5 <- P4 %*% t(P1)
P6 <- P2 * P5
P2new <- P6/(rowSums(P6))

P5 <- t(P2) %*% P4 
P6 <- P1 * P5 
P1new <- P6/(rowSums(P6))

P1 <- P1new
P2 <- P2new
}

Hope it helps anybody still looking for this.

MItrajyoti
  • 508
  • 9
  • 14