I'm trying to figure out a bit the mechanics behind how "R" handles factors as a predictor variable. None of which I write below perhaps is good practice, but, it is out of sheer curiosity, so would appreciate any thoughts. The standard Iris dataset in R has columns: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species, of which the last is a factor. The standard thing to do with this dataset is to demonstrate classification algorithms by say using a neural net or tree, where
rnn1 <- rxNeuralNet(Species~Sepal.Length+Sepal.Width+...,data = iris, numHiddenNodes = 100, numIterations = 1000,type = "classification")
I decided to see what would happen if you reverse this to:
rnn2 <- rnn1 <- rxNeuralNet(Petal.Width~Sepal.Length+Sepal.Width+Species,data = iris, numHiddenNodes = 100, numIterations = 1000,type = "regression")
I then created my test data-frame:
df1 <- data.frame(Petal.Width=5,Sepal.Length=12,Sepal.Width=3,Species="setosa",Petal.Length=3)
rxPredict() then gives me a score of 0.6058862 for species "setosa". But, very strangely, and this is my question, I can put any "string" I want for species, and I'll still get a prediction. I put Species="Jack"
, and rxPredict now gives a score of 1.545223. This is weird because in standard-R, it will throw a factor error if you try to predict against any factors that weren't in the original dataset.
Any ideas?
Thanks.