8

I have a training set that looks like

Name       Day         Area         X    Y    Month Night
ATTACK    Monday   LA           -122.41 37.78   8      0
VEHICLE  Saturday  CHICAGO      -1.67    3.15   2      0
MOUSE     Monday   TAIPEI       -12.5    3.1    9      1

Name is the outcome/dependent variable. I converted Name, Area and Day into factors, but I wasn't sure if I was supposed to for Month and Night, which only take on integer values 1-12 and 0-1, respectively.

I then convert the data into matrix

ynn <- model.matrix(~Name , data = trainDF)
mnn <- model.matrix(~ Day+Area +X + Y + Month + Night, data = trainDF)

I then setup tuning the parameters

nnTrControl=trainControl(method = "repeatedcv",number = 3,repeats=5,verboseIter = TRUE, returnData = FALSE, returnResamp = "all", classProbs = TRUE, summaryFunction = multiClassSummary,allowParallel = TRUE)
nnGrid = expand.grid(.size=c(1,4,7),.decay=c(0,0.001,0.1))
model <- train(y=ynn, x=mnn, method='nnet',linout=TRUE, trace = FALSE, trControl = nnTrControl,metric="logLoss", tuneGrid=nnGrid)

However, I get the error Error: nrow(x) == n is not TRUE for the model<-train

I also get a similar error if I use xgboost instead of nnet

Anyone know whats causing this?

Jaap
  • 81,064
  • 34
  • 182
  • 193
user5739619
  • 1,748
  • 5
  • 26
  • 40
  • Not sure about the error, but you should convert `Month` and `Night` to factor variables too. – ytk Feb 20 '16 at 18:51
  • I just did that. That didn't solve the error – user5739619 Feb 20 '16 at 18:58
  • 2
    `y` should be a numeric or factor vector containing the outcome for each sample, not a matrix. Try `train(y = trainDF$Name, ...`; it gives different errors with your example data but perhaps it will work with a full dataset. – Julius Vainora Feb 20 '16 at 19:36
  • Trying that I get the error `At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to ATTACK, VEHICLE, MOUSE, ... . Please use factor levels that can be used as valid R variable names (see ?make.names for help).` But `Name` already is a factor, according to `str(trainDF$Name)`. So I don't understand this error – user5739619 Feb 20 '16 at 19:42
  • I see, maybe its because some the values in `Name` are invalid. Some of its values are `Hit Run`, `Home Run`, etc. So maybe the spaces are causing the problem? How can I fix that? – user5739619 Feb 20 '16 at 19:46
  • @user5739619, so, did my answer help? – Julius Vainora Feb 21 '16 at 13:49

2 Answers2

12

y should be a numeric or factor vector containing the outcome for each sample, not a matrix. Using

train(y = make.names(trainDF$Name), ...)

helps, where make.names modifies values so that they could be valid variable names.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
0

Even though in the help file of train said either maxtrix or data frame would be expected, but you can try to convert the matrix to a data frame:

model <- train(y=ynn, x=as.data.frame(mnn), method='nnet',linout=TRUE, trace = FALSE, trControl = nnTrControl,metric="logLoss", tuneGrid=nnGrid)
Sixiang.Hu
  • 1,009
  • 10
  • 21