I have 2 issues.
When I try to split my data into test and train sets, using
sample.split
as below, the sampling is done rather unclearly. What I mean is that the data d, has a length of 392 and so, 4:1 division should show 0.8*392= 313.6 i.e. 313 or 314 rows in test set, but the shown length is 304. Is there something that I might be missing?require(caTools) set.seed(101) samplev = sample.split(d[,], SplitRatio= 0.80) train = subset(d, samplev == TRUE) test = subset(d, samplev == FALSE)
I'm trying to use the split data as follows for a logistic regression task in R, as follows-
#Training m <- glm(mpg01~ . -name, data= train, family = binomial(link = 'logit')) out2 <- predict.glm(m, test, type = "response") class2 <- vector() for (i in 1:length(out2)) { if(out2[i] >= 0.5) { class2[i] <- 1 } else { class2[i] <- 0 } } r2 <- table(class2, test$mpg01) #confusion Matrix
The idea is to not use 'name' column in the data for the training. When I try to run the built model on test data, it shows the following-
out2 <- predict.glm(m, test, type = "response")
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor name has new levels amc ambassador sst, amc concord dl 6, amc pacer, amc pacer d/l, amc rebel sst, audi 100 ls, audi 5000, buick century 350, buick century limited, cadillac seville, capri ii, chevrolet bel air, chevrolet cavalier, chevrolet cavalier wagon, chevrolet monte carlo, chevrolet vega 2300, chrysler lebaron town @ country (sw), chrysler new yorker brougham, datsun 510 hatchback, datsun b210 gx, datsun f-10 hatchback, dodge aries wagon (sw), dodge aspen 6, dodge colt hardtop, dodge colt m/m, dodge dart custom, dodge magnum xe, dodge rampage, fiat 124 tc, ford mustang, ford mustang ii, ford ranger, honda civic 1500 gl, maxda rx3, mazda 626, mazda glc 4, mazda glc custom, mercedes-benz 240d, mercedes-benz 280s, mercury capri 2000, mercury marquis, oldsmobile cutlass ciera (diesel), peugeot 505s turbo diesel, plymouth 'cuda 340, plymouth fury gran sedan, plymouth grand fury, plymouth horizon, plymouth horizon miser, plymouth horizon tc3, plymouth satellite, plymo
From my understanding, shouldn't this error not show up since we are not using the 'names' attribute? Or if we are somehow using it when it isn't intended to, what is it that I'm doing wrong?