R: Spectral clustering for text classification

Question

I am using the tm package to create a corpus of documents and I want to use spectral clustering (kernlab package) for text classification.

So, if I have a corpus

my_corpus = VCorpus(DirSource(directory="C:/Users/me/Desktop/Documents", pattern="txt")

And I want to perform spectral clustering using the specc function which takes the following arguments

specc(x, centers, kernel)

What do I put as the first argument? The documentation says that x has to be "the matrix of data to be clustered, or a symbolic description of the model to be fit, or a kernel Matrix of class kernelMatrix, or a list of character vectors". But simply putting my_corpus doesn't work. So I am confused how this works if you have a corpus of documents.

Has QUIT--Anony-Mousse · Answer 1 · 2017-03-06T08:37:21.807

0

Choose an appropriate kernel
Compute kernel matrix
Spectral clustering
Evaluate, evaluate, evaluate. Clustering is likely to fail, but produce a result nevertheless. And on text, any result can be interpreted to look good... see the two publications on topic modeling with 'reading tea leaves" in the title!

edited Mar 06 '17 at 08:37

answered Mar 06 '17 at 07:59

Has QUIT--Anony-Mousse

76,138
12
138
194

My question is how to compute the kernel matrix if you have a corpus of documents. Do you know this? – vdvaxel Mar 06 '17 at 13:33
Whichever way you want. That's straightforward. You need the K(i,j) for any two documents, store them in a matrix. – Has QUIT--Anony-Mousse Mar 07 '17 at 07:23
What do you mean with K(i,j) though? Is there a standard function to convert a matrix to a kernel matrix? – vdvaxel Mar 07 '17 at 09:01
K is the kernel function you want to use. – Has QUIT--Anony-Mousse Mar 07 '17 at 22:29
first try to play with your document-term matrix by dtm <- DocumentTermMatrix(my_corpus) – Selcuk Akbas Mar 06 '18 at 22:36

score 0 · Answer 2 · answered Nov 29 '19 at 07:10

0

x requires a Matrix ir a Dataframe. A Corpus is not any if them. You should transform the Corpus into a document-term-matrix and them convert It to Matrix format.

answered Nov 29 '19 at 07:10

Gonzalo Pellejero

1

1

Welcome to SO. Can you please provide an example? – TungstenX Nov 29 '19 at 07:19

R: Spectral clustering for text classification

2 Answers2