0

Hi I have a PMML generated for a logistic regression model using R as follows. Only the first part of the pmml is shown here.

<?xml version="1.0"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
 <Header copyright="Copyright (c) 2015 Upeksha" description="Generalized Linear Regression Model">
  <Extension name="user" value="Upeksha" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2015-12-02 08:41:27</Timestamp>
 </Header>
 <DataDictionary numberOfFields="11">
  <DataField name="ResponseAccountName" optype="continuous" dataType="double"/>
  <DataField name="RegionCat" optype="categorical" dataType="string">
   <Value value="ROW"/>
   <Value value="EUROPE"/>
   <Value value="NAM"/>
  </DataField>
  <DataField name="TitleCat" optype="categorical" dataType="string">
   <Value value="1"/>
   <Value value="2"/>
   <Value value="3"/>
   <Value value="4"/>
  </DataField>
  <DataField name="RLMaxTitle" optype="categorical" dataType="string">
   <Value value="1"/>
   <Value value="2"/>
   <Value value="3"/>
   <Value value="4"/>
  </DataField>
  <DataField name="Act1_rate" optype="continuous" dataType="double"/>
  <DataField name="Act2_rate" optype="continuous" dataType="double"/>
  <DataField name="Act3_rate" optype="continuous" dataType="double"/>
  <DataField name="Act4_rate" optype="continuous" dataType="double"/>
  <DataField name="Act5_rate" optype="continuous" dataType="double"/>
  <DataField name="Act6_rate" optype="continuous" dataType="double"/>
  <DataField name="AccntAct_rate" optype="continuous" dataType="double"/>
 </DataDictionary>
 <GeneralRegressionModel modelName="Logistic_Regression" modelType="generalizedLinear" functionName="regression" algorithmName="glm" distribution="binomial" linkFunction="logit">
  <MiningSchema>
   <MiningField name="ResponseAccountName" usageType="predicted"/>
   <MiningField name="RegionCat" usageType="active"/>
   <MiningField name="TitleCat" usageType="active"/>
   <MiningField name="RLMaxTitle" usageType="active"/>
   <MiningField name="Act1_rate" usageType="active"/>
   <MiningField name="Act2_rate" usageType="active"/>
   <MiningField name="Act3_rate" usageType="active"/>
   <MiningField name="Act4_rate" usageType="active"/>
   <MiningField name="Act5_rate" usageType="active"/>
   <MiningField name="Act6_rate" usageType="active"/>
   <MiningField name="AccntAct_rate" usageType="active"/>
  </MiningSchema>
  <Output>
   <OutputField name="Predicted_ResponseAccountName" feature="predictedValue"/>
  </Output>

The OutputField dataType is not present here. How could a PMMl reader interpret it's type if so?

I checked the PMML spec and it says that dataType for OutputField is not always required. I am writing a pmml reader and I need to know how the interpretation is done for a pmml like this.

DesirePRG
  • 6,122
  • 15
  • 69
  • 114

1 Answers1

0

The dataType and optype attributes are optional for the OutputField element, but any sane PMML producer should specify them anyway, as that would make the life much easier for PMML consumers.

If the dataType attribute is missing, then you can infer it based on the feature attribute of the OutputField element. In the current case, the value of the feature attribute is set to predictedValue, which means that the data type and the operational type will be "copied" from the DataField element that represents the target field of this model. Here, the target field (aka the predicted field) is called "ResponseAccountName", which means that the value of this OutputField element will be continuous double.

user1808924
  • 4,563
  • 2
  • 17
  • 20