test dtm 1 on the basis of dtm..so that 1 can predict the categories of dtm1

Question

library functions

     library(tm)
     library(e1071)
     library(plyr)

Inserting the journal names to be categorized

sample = c(
    "An Inductive Inference Machine",
    "Computing Machinery and Intelligence",
    "On the translation of languages from left to right",
    "First Draft of a Report on the EDVAC",
    "The Rendering Equation")
corpus <- Corpus(VectorSource(sample))
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, stemDocument,language="english")
corpus <- tm_map(corpus, stripWhitespace)
dtm <- DocumentTermMatrix(corpus)

term document matrix as training set

inspect(dtm)
Category=c("Machine learning","Artificial intelligence","Compilers","Computer   architecture","Computer graphics")

declaration of the categories

my.data=data.frame(as.matrix(dtm),Category)
my.data 
sample = c(
    "gprof: A Call Graph Execution Profiler",
    "Architecture of the IBM System/360",
    "A Case for Redundant Arrays of Inexpensive Disks (RAID)",
    "Determining Optical Flow",
    "A relational model for large shared data banks",
    "some complementarity problems of z and lyoponov like transformations on       edclidean  jordan algebra")
corpus <- Corpus(VectorSource(sample))
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, stemDocument,language="english")
corpus <- tm_map(corpus, stripWhitespace)
dtm1 <- DocumentTermMatrix(corpus)

term document matrix as testing set

inspect(dtm1)

yaa..im new to this site...surely incorporate ur suggestions..btw i want to predict the categories of the second set using the first set..plzz help — user3675487, May 29 '14 at 03:42

score 0 · Accepted Answer · answered May 27 '14 at 21:39

0

Well, your sample data has absolutely no overlapping terms, so there's not really much you can do there. The tm library doesn't assign meaning to words, it just measures their correlation. So you need to supply enough overlapping data so that is has a chance of matching up new input to an existing corpus.

Once you actually have real data, you have many options on how you want to build a model. You can use a kNN classifier like that in the class package, or a decision tree like that in the rpart package, or a neural network like that in the nnet package. There are examples of each of those in this presentation. But it's up to you to decide what's right for your data. That part is not a programming related question.

answered May 27 '14 at 21:39

MrFlick

195,160
17
277
295

can i have ur mail id?..i can mail u the set of data which has enough overlapping terms..due to lack of space i can"t post here in the cmmnt box – user3675487 May 29 '14 at 03:47
@user3675487 I'm sorry, I can't analyze your data for you. I would follow the examples in the presentation I linked. You will need to decide for yourself which method is appropriate for your data. If you are unsure, I suggest you consult a statistician. – MrFlick May 29 '14 at 03:49
actually i have tried many combinations ...stacking the train and test set ..then using the "head" as train set..but some how the output is not desirable.. – user3675487 May 29 '14 at 03:54