I am training a random forest model in the randomForest
package for my data. Some variables are in the class of character. I am pretty sure that randomForest
will only take factor and numeric classes as input. So I think R automatically coerces the character into numeric.
In order for me to know how this may affect my modelling result, does anyone know how R automatically coerces the character into numeric class (like an algorithm/rule)? Or any source code I can look at?
I am using R version 4.0.1.
Thanks in advance.
An update: I checked using
getTree(mod,1,labelVar=TRUE)
And I can see that if those character variables are converted to factors, then the "split point" in the output is an integer (which means it is a categorical variable (see: https://www.rdocumentation.org/packages/randomForest/versions/4.6-14/topics/getTree)). But if not converted to factors, then the "split point" in the output is not integer.
So I guess is that R coerces the values of those character variables into numeric values? But how?