Python scikit-learn: How do I convert decision tree leaves to dummy variables?

Question

I am using scikit-learn DecisionTreeClassifier to build a decision tree. Assume that a given decision tree has 6 leaf/terminal nodes (A, B, C, D, E and F). I now want to assign the original records coded as to which leaf/terminal node they would belong to (think of it as a form of feature engineering).

I would prefer not to score the records directly, but instead to build a collection of dummy variables from a variety of trees into a feature engineering pipeline.

Does anyone know of any easy approach for doing this?

An example is given here: http://scikit-learn.org/stable/auto_examples/ensemble/plot_feature_transformation.html#sphx-glr-auto-examples-ensemble-plot-feature-transformation-py. It uses GradientBoostingClassifier though but you can take the idea. — Vivek Kumar, Jun 22 '18 at 07:01

Jan K · Answer 1 · 2018-06-21T17:37:42.130

0

Something similar is implemented under ensemble.RandomTreesEmbedding. Note that n_estimators denotes the number of decision trees.

See docs here.

edited Jun 21 '18 at 17:37

answered Jun 21 '18 at 17:28

Jan K

4,040
1
15
16

1

Thanks very much, I will look into it. – T. Scott Clendaniel Jun 21 '18 at 18:07

Python scikit-learn: How do I convert decision tree leaves to dummy variables?

1 Answers1