I have a dataset with 9 features, from x1
to x9
. Target variable is Target
(I have a classification problem). The code:
# Splitting the dataset into the Training set and Test set
# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Target, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
training_set[-c(2,5)] = scale(training_set[-c(2,5)])
test_set[-c(2,5)] = scale(test_set[-c(2,5)])
# Fitting Decision Tree Classification to the Training set
# install.packages('rpart')
library(rpart)
classifier = rpart(formula = Target ~ .,
data = training_set)
# Predicting the Test set results
y_pred = predict(classifier, newdata = test_set[-2], type = 'class')
# Making the Confusion Matrix
cm = table(test_set[, 2], y_pred)
plot(classifier, uniform=TRUE,margin=0.2)
text(classifier)
produces:
Anyway, I see 7 variables sorted by importance. The first question is: why only 7 (they are 9)?
summary(classifier)
Variable importance
x7 x6 x4 x1 x3 x2 x5
27 18 17 14 11 9 4
Moreover (this is the second questions) x3
is missing in the plot. Why?
The dataset is too big and I think I can't put it here, but I wanted to know if something similar has happened to you and if you have found any possible explanations.
Thank you!