4

I've just started learning a bit about PMML and I think that the TreeModel almost fits what I'm trying to achieve, but I’ve got a question I haven't been able to solve reading the documentation:

Is it possible to make a TreeModel return multiple values? I've found some examples of TreeModels, but all of them just declare a single "predicted" field and what I need is that if the predicate of a node evaluates to TRUE , the model returns multiple values. Is that even possible? If so, how would you implement that?

EDIT

Added an example of what I want to achieve:


In the documentation for the TreeModel in the section Scoring Procedure, there's an example of a TreeModel named "golfing". In that example, please correct if I'm wrong, the logical structure that tells which value will be asigned to the field(predicted) "whatIdo", once the model is evaluated could be expressed this way:

if(outlook=="sunny") {
    whatIdo="will play";
    if(temperature<90 AND temperature>50){
        whatIdo="will play";
        if(humidity<80){
            whatIdo="will play";
        }
        else if(humidity>=80){
            whatIdo="no play";
        }
    }
    else if(temperature>=90 OR temperature<=50){
        whatIdo="no play";
    }
    
}
else if(outlook=="overcast" OR outlook=="rain"){
    whatIdo="may play";
    if(temperature > 60 AND temperature < 100 AND outlook="overcast" AND humidity <70 AND windy="false"){
        whatIdo="may play";
    }
    else if(outlook=="rain" AND humidity<70 ){
        whatIdo="no play";
    }
}

What I need to know is if apart from the whatIdo field, I could return other values, for example an additional field named : "whatElseIdo". Would it be possible to create a PMML model that, for example based on the "golfing" model, returns an extra field as the following conditional does :

if(outlook=="sunny") {
    whatIdo="will play";
    whatElseIdo="will have a picnic";
    if(temperature<90 AND temperature>50){
        whatIdo="will play";
        whatElseIdo="will have a picnic";
        if(humidity<80){
            whatIdo="will play";
            whatElseIdo="will have a picnic";
        }
        else if(humidity>=80){
            whatIdo="no play";
            whatElseIdo="no have a picnic";
        }
    }
    else if(temperature>=90 OR temperature<=50){
        whatIdo="no play";
        whatElseIdo="no have a picnic";
    }

}
else if(outlook=="overcast" OR outlook=="rain"){
    whatIdo="may play";
    whatElseIdo="may have a picnic";
    if(temperature > 60 AND temperature < 100 AND outlook="overcast" AND humidity <70 AND windy="false"){
        whatIdo="may play";
        whatElseIdo="may have a picnic";
    }
    else if(outlook=="rain" AND humidity<70 ){
        whatIdo="no play";
        whatElseIdo="no have a picnic";
    }
}

Thanks.

Community
  • 1
  • 1
Axel
  • 1,674
  • 4
  • 26
  • 38

1 Answers1

1

PMML operates with scalar values. It is possible to "emulate" a collection-like behaviour if the prediction of a TreeModel is a string value that encodes multiple values in an application-specific data format.

For example, you could encode multiple fruit values as a comma-separated list:

<TreeModel>
  <Node score="apple,orange,pineappe">
    <True/>
  </Node>
</TreeModel>

However, it is a good idea to keep such business logic out of your PMML files. Have your TreeModel predict a scalar value, and perform the mapping from one value space to another (eg. "fruitbasket_11" -> "apple,orange,pineapple") in some other application layer.

Updated for Edit

Decision tree is a classical supervised learning method. It is trained using a dataset that has a single predicted field. Therefore, the TreeModel element also supports only a single predicted field.

However, PMML is rather flexible and lets you work around this limitation if you really need to.

Some more ideas:

  • Rework the above answer so that the score attribute represents a Map, not a List. For example <Node score="firstfruit=apple,secondfruit=orange,thirdfruit=pineapple">.
  • If you need to represent exactly two predicted fields, and the second predicted field has unique values (ie. has identifier-like properties), then it can be stored as the id attribute. For example, <Node score="may play" id="golfing_location_11">. The value of the id attribute is made available as entityId output feature.
  • For every predicted field, have a separate TreeModel element. Then, combine all those TreeModel elements into a master model using PMML's model segmentation mechanism.
user1808924
  • 4,563
  • 2
  • 17
  • 20
  • Thanks for the quick reply. I added an example to my question to help clarify what I'm trying to achieve. Please see the **Edit** – Axel Aug 12 '15 at 20:47