0

I get the below error when using rpart library

dt <- rpart(formula, method="class", data=full.df.allAttr.train);

Error in model.frame.default(formula = formula, data = full.df.allAttr.train,  : 
  object is not a matrix

When i convert full.df.allAttr.trainto matrix

dt <- rpart(formula, method="class", data= as.matrix( full.df.allAttr.train));

Error in model.frame.default(formula = formula, data = as.matrix(full.df.allAttr.train),  : 
  'data' must be a data.frame, not a matrix or an array

When i check for the class type its a data frame

class(full.df.allAttr.train)

[1] "data.frame"

thank you for the inputs , the error went off when i created the formula with the proper column name which has the outcomes.

measurevar <- "SpeakerName"
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ")
formula <- as.formula(formula_str) 

It give a different error since my data frame has row.names as text below is the snapshot

Error in model.frame.default(formula = formula, data = full.df.train,  : 
  variable lengths differ (found for 'character(0)')

enter image description here

Sorry new to this i will add the full source code and data sets

library(tm)
library(rpart)
obamaCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/obama" , encoding="UTF-8"))
romneyCorpus <- Corpus(DirSource(directory = "D:/R/Chap 6/Speeches/romney" , encoding="UTF-8"))
fullCorpus <- c(obamaCorpus,romneyCorpus)#1-22 (obama), 23-44(romney)
fullCorpus.cleansed <- tm_map(fullCorpus, removePunctuation)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stripWhitespace)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, tolower)
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, removeWords, stopwords("english"))
fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, PlainTextDocument)
#fullCorpus.cleansed <- tm_map(fullCorpus.cleansed, stemDocument)

full.dtm <- DocumentTermMatrix(fullCorpus.cleansed)
full.dtm.spars <- removeSparseTerms(full.dtm , 0.6)

full.matix <- data.matrix(full.dtm.spars)
full.df <- as.data.frame(full.matix)

full.df[,"SpeakerName"] <- "obama"
full.df$SpeakerName[21:44] <- "romney"

train.idx <- sample(nrow(full.df) , ceiling(nrow(full.df)* 0.6))
test.idx <- (1:nrow(full.df))[-train.idx]
rowNames <- colnames(full.df)

measurevar <- "SpeakerName"
formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ")
formula <- as.formula(formula_str)
dt <- rpart(formula, method="class", data=full.df.train);

Fails at the last step

Data Sets are here https://drive.google.com/folderview?id=0B1SogodTE-kJSHF6aFRmQURsV0U&usp=sharing

user2478236
  • 691
  • 12
  • 32
  • I imagine that is frustrating. you can you create a reproducible example – rawr Oct 20 '15 at 13:49
  • check the result of as.matrix( full.df.allAttr.train ) – Ven Yao Oct 20 '15 at 14:19
  • thank you for the inputs , the error went off when i created the formula right. { measurevar <- "SpeakerName" formula_str <- paste(measurevar, paste(rowNames, collapse=" + "), sep=" ~ ") formula <- as.formula(formula_str) } – user2478236 Oct 20 '15 at 14:24
  • @user2478236 You really need to edit your question and add the `dput` of your data.frame. How are we supposed to use an image? You should add the data.frame or part of the data.frame that creates the error you see. That's how we will be able to help you. – LyzandeR Oct 20 '15 at 14:41

1 Answers1

0

You forgot to include full.df.train and your formula is not fine.

This will work:

full.df.train <- full.df[train.idx, ]
dt <- rpart(SpeakerName ~ ., method = "class", data = full.df.train)

The problem with your formula is that you include SpeakerName in both sides of ~. If you want to use all variables, using the .expression is much easier and compact.

Lluís Ramon
  • 576
  • 4
  • 7