I am trying to run knnreg from the package caret. For some reason, this training set works:
> summary(train1)
V1 V2 V3
13 : 10474 1 : 6435 7 : 8929
10 : 10315 2 : 6435 6 : 8895
4 : 10272 3 : 6435 9 : 8892
1 : 10244 4 : 6435 10 : 8892
2 : 10238 7 : 6435 15 : 8874
24 : 10228 8 : 6435 40 : 8870
(Other):359799 (Other):382960 (Other):368218
While this one won't work:
> summary(train2)
V1 V2 V3 V4
13 : 10474 1 : 6436 7 : 8929 Christmas : 5946
10 : 10315 2 : 6436 6 : 8895 Labor Day : 8861
4 : 10272 3 : 6438 9 : 8892 None :391909
1 : 10244 4 : 6435 10 : 8892 Super Bowl : 8895
2 : 10238 7 : 6435 15 : 8874 Thanksgiving: 5959
24 : 10228 8 : 6435 40 : 8870
(Other):359799 (Other):382960 (Other):368218
Here is the target vector:
> summary(Target)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-499 200 712 1980 20210 693100
The error I get is during the prediction phase:
> fit <- knnreg(train2, Target, k = 2)
> Prediction <- predict(fit, newdata=test)
Error in knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, :
NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning messages:
1: In knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, :
NAs introduced by coercion
2: In knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, :
NAs introduced by coercion
While this is my test set:
> summary(test)
V1 V2 V3 V4
13 : 2836 1 : 1755 51 : 3002 Christmas : 2988
4 : 2803 2 : 1755 49 : 2989 Labor Day : 0
19 : 2799 3 : 1755 52 : 2988 None :106136
2 : 2797 4 : 1755 50 : 2986 Super Bowl : 2964
27 : 2791 7 : 1755 6 : 2984 Thanksgiving: 2976
24 : 2790 8 : 1755 47 : 2976
(Other):98248 (Other):104534 (Other):97139
What am I missing?
EDIT: Switching the V4 set labels to '1', '2', ... actually fixes the problem. Is the algorithm considers my features as numerical even though they're factors?