Questions tagged [decision-tree]

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.

Decision Tree could be just a graphical tool or the learning algorithm in a post.

2545 questions
8
votes
2 answers

Calculating entropy in decision tree (Machine learning)

I do know formula for calculating entropy: H(Y) = - ∑ (p(yj) * log2(p(yj))) In words, select an attribute and for each value check target attribute value ... so p(yj) is the fraction of patterns at Node N are in category yj - one for true in target…
code muncher
  • 1,592
  • 2
  • 27
  • 46
8
votes
2 answers

Classifying Single Instance in Weka

I trained and created a J48 model using WEKA gui. I saved the model file to my computer and now I would like to use it to classify a single instance in my Java code. I would like to get a prediction for the attribute "cluster". What I do is the…
Erol
  • 6,478
  • 5
  • 41
  • 55
8
votes
2 answers

Weighted Decision Trees using Entropy

I'm building a binary classification tree using mutual information gain as the splitting function. But since the training data is skewed toward a few classes, it is advisable to weight each training example by the inverse class frequency. How do I…
Jacob
  • 34,255
  • 14
  • 110
  • 165
7
votes
2 answers

Why such a big pickle of a sklearn decision tree (30K times bigger)?

Why pickling a sklearn decision tree can generate a pickle thousands times bigger (in terms of memory) than the original estimator? I ran into this issue at work where a random forest estimator (with 100 decision trees) over a dataset with around…
pietroppeter
  • 1,433
  • 13
  • 30
7
votes
2 answers

How to remove training data from party:::ctree models?

I created several ctree models (about 40 to 80) which I want evaluate rather often. An issue is that the model objects are very big (40 models require more than 2.8G of memory) and it appears to me, that they stored the training data, maybe as…
7
votes
3 answers

Native Java Solution to Decision Table

I'm haiving an interesting discussion with an esteemed colleague and would like some additional input... I need to implement some basic decision table logic in my application. I was looking to use OpenL Tablets which represents decision data in an…
Elwood
  • 4,451
  • 4
  • 18
  • 20
7
votes
1 answer

Current node to next node feature combinations in decision tree learning: useful to determine potential interactions?

Using some guidance from this scikit-learn tutorial on understanding decision tree structures, I had the idea that perhaps looking at combinations of features occurring between two connected nodes might give some insight as to potential…
blacksite
  • 12,086
  • 10
  • 64
  • 109
7
votes
1 answer

sklearn min_impurity_decrease explanation

The definition of min_impurity_decrease in sklearn is A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Using the Iris dataset, and putting min_impurity_decrease = 0.0 How the tree looks…
Stev Allen
  • 71
  • 1
  • 1
  • 4
7
votes
2 answers

How do I get all Gini indices in my decision tree?

I have made a decision tree using sklearn, here, under the SciKit learn DL package, viz. sklearn.tree.DecisionTreeClassifier().fit(x,y). How do I get the gini indices for all possible nodes at each step? graphviz only gives me the gini index of the…
vivian
  • 732
  • 1
  • 6
  • 18
7
votes
1 answer

What is XGBoost pruning step doing?

When I use XGBoost to fit a model, it usually shows a list of messages like "updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=5". I wonder how XGBoost is performing the tree pruning? I cannot find the…
DiveIntoML
  • 2,347
  • 2
  • 20
  • 36
7
votes
1 answer

interpreting Graphviz output for decision tree regression

I'm curious what the value field is in the nodes of the decision tree produced by Graphviz when used for regression. I understand that this is the number of samples in each class that are separated by a split when using decision tree classification…
7
votes
2 answers

Feature_importance vector in Decision Trees in SciKit Learn along with feature names

I am running the Decision Trees algorithm from SciKit Learn and I want to get the Feature_importance vector along with the features names so I can determine which features are dominant in the labeling process. Could you help me? Thank you.
AlK
  • 443
  • 2
  • 12
  • 19
7
votes
1 answer

decision tree in R error:fit is not a tree,just a root

good afternoon! I have problem with a decisional trees. f11<-as.factor(Z24train$f1) fit_f1 <- rpart(f11~TSU+TSL+TW+TP,data = Z24train,method="class") plot(fit_f1, uniform=TRUE, main="Classification Tree for Kyphosis") But this error appears: Error…
Emilia
  • 81
  • 1
  • 1
  • 2
7
votes
0 answers

Splitters in scikit learn decision tree

I'm trying to understand the implementation of splitters of decision tree in scikit learn.But I have stuck the point where it start finding the best split. Need help in understanding the algo that is going on in it. Code that I need to understand in…
Yank Leo
  • 452
  • 5
  • 19
7
votes
2 answers

Major assumptions of machine learning classifiers (LG, SVM, and decision trees)

In classical statistics, people usually state what assumptions are assumed (i.e. normality and linearity of data, independence of data). But when I am reading machine learning textbooks and tutorials, the underlying assumptions are not always…
KubiK888
  • 4,377
  • 14
  • 61
  • 115