0

I want to add an additional target ("outputState") to my PMML-Regression modell.

  • outputState = 0: no missing/invalid input values(-> no imputation in the regression model)
  • outputState = 1: there are missing/invalid invalid values (->imputation in the regression model)

I tried to work with multiple models but I dont know how to handle multiple models/targets/outputs right.

Example (explanation below):

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3"><Header><Application name="JPMML-R" version="1.3.14"/><Timestamp>2020-01-07T15:56:07Z</Timestamp></Header>
    <DataDictionary>
      <DataField name="outputState" optype="categorical" dataType="integer"/>
      <DataField name="outputResult" optype="continuous" dataType="double"/>
      <DataField name="inputA" optype="continuous" dataType="double">
        <Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
        <Value property="missing" value="NA"/>
      </DataField>
      <DataField name="inputB" optype="continuous" dataType="double">
        <Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
        <Value property="missing" value="NA"/>
      </DataField>
      <DataField name="inputC" optype="continuous" dataType="double">
        <Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
        <Value property="missing" value="NA"/>
      </DataField>
    </DataDictionary>
    <TransformationDictionary/>
    <MiningModel functionName="mixed">
      <MiningSchema>
      <MiningField name="outputState" usageType="target"/>
      <MiningField name="outputResult" usageType="target"/>
      <MiningField name="inputA"/>
      <MiningField name="inputB"/>
      <MiningField name="inputC"/>
    </MiningSchema>
    <Output>
      <OutputField name="outputState" optype="categorical" dataType="integer" targetField="outputState"/>
      <OutputField name="outputResult" optype="continuous" dataType="double" targetField="outputResult"/>
    </Output>
    <Segmentation multipleModelMethod="selectAll">
      <Segment id="1">
        <True/>
        <TreeModel modelName="TEST" functionName="classification" noTrueChildStrategy="returnLastPrediction">
          <MiningSchema>
            <MiningField name="outputState" usageType="target"/>
            <MiningField name="inputA" invalidValueTreatment="asMissing"/>
            <MiningField name="inputB" invalidValueTreatment="asMissing"/>
            <MiningField name="inputC" invalidValueTreatment="asMissing"/>
          </MiningSchema>
          <Node score="0">
          <True/>
            <Node score="1">    
              <CompoundPredicate booleanOperator="or">
              <SimplePredicate field="inputA" operator="isMissing"/>
              <SimplePredicate field="inputB" operator="isMissing"/>
              <SimplePredicate field="inputC" operator="isMissing"/>
              </CompoundPredicate>
            </Node> 
          </Node>
        </TreeModel>
      </Segment>
      <Segment id="2">
        <True/>
        <RegressionModel functionName="regression">
          <MiningSchema>
            <MiningField name="outputResult" usageType="target"/>
            <MiningField name="inputA" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
            <MiningField name="inputB" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
            <MiningField name="inputC" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
          </MiningSchema>
          <RegressionTable intercept="2">
            <NumericPredictor name="inputA" coefficient="1"/>
            <NumericPredictor name="inputB" coefficient="2"/>
            <NumericPredictor name="inputC" coefficient="3"/>
          </RegressionTable>
        </RegressionModel>
      </Segment>
    </Segmentation>
    </MiningModel>
    </PMML>

Explanation:

  1. DataDictonary (with left and right margins)
  2. MiningModel (functionName="mixed" seemed to be wrong?; Segmentation multipleModelMethod="selectAll" wrong too?):
    • output definition (seemed to be wrong too? because of different targets?)
    • simple classification treemodel (to detect missing/imputed values) -> target: outputState
    • simple regression model -> target:outputResult

Anyone an idea or better suggestions?

D_H
  • 81
  • 1
  • 2
  • What are you trying to accomplish? Is it about manually coding a PMML (pseudo-)model, which returns a boolean indicating if some field (eg. 'inputA') is missing, or is invalid? If so, then you should define a no-op model (eg. empty RegressionModel element), and add an Output element to it which contains OutputField child elements (of 'transformedValue') which test missingness and validity using 'isMissing', 'isValid', 'isNotMissing' and 'isNotValid' built-in functions. – user1808924 Apr 01 '20 at 17:28
  • thx, it worked with 'Output' and 'transformedValue'! – D_H Apr 06 '20 at 13:52

0 Answers0