-1

I am using scikit-learn DecisionTreeClassifier to build a decision tree. Assume that a given decision tree has 6 leaf/terminal nodes (A, B, C, D, E and F). I now want to assign the original records coded as to which leaf/terminal node they would belong to (think of it as a form of feature engineering).

I would prefer not to score the records directly, but instead to build a collection of dummy variables from a variety of trees into a feature engineering pipeline.

Does anyone know of any easy approach for doing this?

  • 1
    An example is given here: http://scikit-learn.org/stable/auto_examples/ensemble/plot_feature_transformation.html#sphx-glr-auto-examples-ensemble-plot-feature-transformation-py. It uses GradientBoostingClassifier though but you can take the idea. – Vivek Kumar Jun 22 '18 at 07:01
  • Now THIS is positively brilliant!!! Thank you so very much! – T. Scott Clendaniel Jun 22 '18 at 16:51

1 Answers1

0

Something similar is implemented under ensemble.RandomTreesEmbedding. Note that n_estimators denotes the number of decision trees.

See docs here.

Jan K
  • 4,040
  • 1
  • 15
  • 16