I'm trying to implement the pLSA algorithm proposed by Thomas Hoffman (1999). However, all the implementations I have found consider the input term-doc matrix as complete instead of sparse. Since my input matrix is quite large and sparse, I would like to find out an algorithm which supports the sparsity. Could you help me find one? Matlab or Java is preferred.
UPDATE I have found out that the PennAspect http://www.cis.upenn.edu/~ungar/Datamining/software_dist/PennAspect/index.html in fact implement PLSA with sparse matrix input.
The solution is simple. A 2D ragged array(an array which does not have the same length for each row) can be used to represent the sparse matrix.