0

I'm using classifier tree to explore the mnist dataset.

The data to create the tree are currently composed with the 26x26 pixs of each images.

My idea is to compute the number of connexe part for each image and to add this result to the data. I suceed to do this with an image processing algorithm.

Then I want to force the tree of the classifier to split on this value (to see if my idea is efficient). Is there a way to do it easily ?

razzi
  • 11
  • 1
  • What do you mean by _connexe part_? Decision trees use **Information Gain** or **Gini** for computing the features to split on. You might be interested in directly computing these scores. If you really want to see this in a decision tree, one way is to inspect the attribute `feature_importances_` in an sklearn [DecisionTreeClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier). – mcb Dec 09 '17 at 16:09
  • The English word is probably not the good one : I use the fonction morphology.label http://scikit-image.org/docs/dev/api/skimage.morphology.html. For example, 8 has 2 "connexe parts", 9 and 6, just 1, etc ... a connexe part is a part of the image where all pixs have the same value and are connected. – razzi Dec 09 '17 at 16:34
  • Is it possible to manually compute the score (Gini or information gain) and give it to the tree ? It would be a solution – razzi Dec 09 '17 at 16:43
  • You can extend the DecisionTreeClassifier and modify/add code in the `fit` function. But the better way to estimate the relevance of a particular feature is to look at the `feature_importances` of a trained tree. See an example [here](http://scikit-learn.org/stable/modules/feature_selection.html#tree-based-feature-selection) – mcb Dec 11 '17 at 08:20

0 Answers0