Generally it is the terms
component that would have that sort of information. See ?rpart::rpart.object
.
fit <- rpart::rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit$terms # notice that the attribute dataClasses has the information
attr(fit$terms, "dataClasses")
#------------
Kyphosis Age Number Start
"factor" "numeric" "numeric" "numeric"
That example doesn't have a csplit node in its structure because none of hte variables are factors. You could make one fairly easily:
> fit <- rpart::rpart(Kyphosis ~ Age + factor(findInterval(Number,c(0,4,6,Inf))) + Start, data = kyphosis)
> fit$csplit
[,1] [,2] [,3]
[1,] 1 1 3
[2,] 1 1 3
[3,] 3 1 3
[4,] 1 3 3
[5,] 3 1 3
[6,] 3 3 1
[7,] 3 1 3
[8,] 1 1 3
> attr(fit$terms, "dataClasses")
Kyphosis
"factor"
Age
"numeric"
factor(findInterval(Number, c(0, 4, 6, Inf)))
"factor"
Start
"numeric"
The integers are just the values of the factor variables so the "mapping" is just the same as it would be from as.numeric()
to the levels()
of a factor. If I were trying to construct a character matrix version of the fit$csplit
-matrix that substituted the names of the levels in a factor variable, this would be one path to success:
> kyphosis$Numlev <- factor(findInterval(kyphosis$Number, c(0, 4, 6, Inf)), labels=c("low","med","high"))
> str(kyphosis)
'data.frame': 81 obs. of 5 variables:
$ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
$ Age : int 71 158 128 2 1 1 61 37 113 59 ...
$ Number : int 3 3 4 5 4 2 2 3 2 6 ...
$ Start : int 5 14 5 1 15 16 17 16 16 12 ...
$ Numlev : Factor w/ 3 levels "low","med","high": 1 1 2 2 2 1 1 1 1 3 ...
> fit <- rpart::rpart(Kyphosis ~ Age +Numlev + Start, data = kyphosis)
> Levels <- fit$csplit
> Levels[] <- levels(kyphosis$Numlev)[Levels]
> Levels
[,1] [,2] [,3]
[1,] "low" "low" "high"
[2,] "low" "low" "high"
[3,] "high" "low" "high"
[4,] "low" "high" "high"
[5,] "high" "low" "high"
[6,] "high" "high" "low"
[7,] "high" "low" "high"
[8,] "low" "low" "high"
Response to comment: If you only have the model then use str() to look at it. I see an "ordered" leaf in the example I created that has the factor labels stored in an attribute named "xlevels":
$ ordered : Named logi [1:3] FALSE FALSE FALSE
..- attr(*, "names")= chr [1:3] "Age" "Numlev" "Start"
- attr(*, "xlevels")=List of 1
..$ Numlev: chr [1:3] "low" "med" "high"
- attr(*, "ylevels")= chr [1:2] "absent" "present"
- attr(*, "class")= chr "rpart"