-1

I am using the tm package to create a corpus of documents and I want to use spectral clustering (kernlab package) for text classification.

So, if I have a corpus

my_corpus = VCorpus(DirSource(directory="C:/Users/me/Desktop/Documents", pattern="txt")

And I want to perform spectral clustering using the specc function which takes the following arguments

specc(x, centers, kernel)

What do I put as the first argument? The documentation says that x has to be "the matrix of data to be clustered, or a symbolic description of the model to be fit, or a kernel Matrix of class kernelMatrix, or a list of character vectors". But simply putting my_corpus doesn't work. So I am confused how this works if you have a corpus of documents.

vdvaxel
  • 667
  • 1
  • 14
  • 41

2 Answers2

0
  1. Choose an appropriate kernel

  2. Compute kernel matrix

  3. Spectral clustering

  4. Evaluate, evaluate, evaluate. Clustering is likely to fail, but produce a result nevertheless. And on text, any result can be interpreted to look good... see the two publications on topic modeling with 'reading tea leaves" in the title!

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

x requires a Matrix ir a Dataframe. A Corpus is not any if them. You should transform the Corpus into a document-term-matrix and them convert It to Matrix format.