0

Currently, I've been involved in some projects related to Data Mining. And, I've to classify the given data sets (.csv format) into different classes by using decision tree induction with GINIsplit as the splitting criterion. All these I've been doing in java platform not using any tools e.g. WEKA, ORANGE...etc.

My query is- what can be the best data structure to represent the decision tree so that the classification would be fast and efficient ? And, are there any optimization techniques for attribute-wise, I mean, specific techniques if the attributes are nominal or numeric or ordinal?

Thanks in advance!

Jivan
  • 1,300
  • 6
  • 21
  • 33

1 Answers1

2

Well, if you really want to have the optimal classification speed, output your decision tree to... .class. I.e. generate a code snippet for the tree, and compile it. This way, evaluation can be executed with the native speed of your Java Hotspot JRE.

Because you can encode a decision tree in program logic:

if (attribute_x < 0.1) {
    switch(attribute_c) {
        case BANANA: {
            ...

The main question is, how far you want to take this optimizations.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • what about the nominal, ordinal attributes...? @Anony-Mousse – Jivan Jan 12 '13 at 14:34
  • I'm talking about the data structure to implement the decision tree...being platform independent? @Anony-Mousse – Jivan Jan 12 '13 at 14:47
  • Whatever you prefer. It's not as if attribute ID + threshold/bitmask/list-of-choices is a large data object that could cause inefficiencies. Did you profile, and identify your decision tree data structure as inefficient? – Has QUIT--Anony-Mousse Jan 12 '13 at 15:44
  • I mean should I use B-tree, B+ tree or simply Linked List for implementation @Anony-Mousse – Jivan Jan 12 '13 at 15:52
  • B- and B+-Tree are **disk** data structures. What would you use as *sorting key*?!? I don't see a use for a Linked List in decision trees either. Use a tree node class, with children. – Has QUIT--Anony-Mousse Jan 13 '13 at 09:33