Say I have a classification problem that is multiclass and characteristically hierarchical, e.g. 'edible', 'nutritious' and '~nutritious' - so it can be represented like so
├── edible
│ ├── nutritious
│ └── ~nutritious
└── ~edible
While one can get reasonable performance with classifiers that support multiclass classification or using one-vs-one/all schemes for those that don't, it may also be beneficial to separately train classifiers at each level and concatenate them so the instances classified as 'edible' can be classified as either nutritious
or not.
I would like to use scikit-lean
estimators as building blocks and I am wondering if I can make the Pipeline
support this or if I would need to write my own estimator that implements the base estimator and possibly BaseEnsemble
to do this.
It has been mentioned before by @ogrisel on the mailing list http://sourceforge.net/mailarchive/message.php?msg_id=31417048 and I'm wondering if anyone has insights or suggestions on how to go about doing this.