0

I'm currently attempting to build module for the Knime analytics platform. This is going to be a module that generates and passes on a PMML model as its output.

So far I've only been able to accomplish this by manually creating a PMMLDocument and then creating a new PMMLPortObject((PMMLPortObjectSpec)out_spec, pmmlDoc) to return.

My question is whether creating the pmml doc itself manually is the right approach here, or is there any other more streamlined method to do this, maybe via templating or something similar ?

Currently, generating a pmml model manually like so:

    PMMLDocument resDoc = PMMLDocument.Factory.newInstance();
    PMML pmml = PMML.Factory.newInstance();
    pmml.setVersion("4.2");

    Header header = pmml.addNewHeader();
    header.setCopyright("some custom made copyright");
    Application application = header.addNewApplication();
    application.setName("KNIME");
    application.setVersion("2.10.3");
    ...

Can get quite tedious and it makes me wonder where this is actually a best practice or not

andrei
  • 339
  • 3
  • 12
  • Have you tried using the result (`org.dmg.pmml.PMMLDocument.PMML`) of `pmml.addNewPMML()` method? The 4.2 version of PMML schema has been used to generate the classes representing the models. – Gábor Bakos Oct 27 '14 at 14:37
  • Hmm, isn't that just a shorthand for `PMML pmml = PMML.Factory.newInstance(); resDoc.addPMML(pmml);` ? In that i would still need to manually create the pmml structure myself. – andrei Oct 27 '14 at 14:56
  • What kind of model would you like to add? `PMML` has methods to add (remove, insert, ...) the different (PMML 4.2) models. After that, yes, you have to manually set the parameters, submodels of your model. – Gábor Bakos Oct 27 '14 at 15:01
  • Currently it's just a simple linear regression model, and yeah, that's what i was currently using, adds and sets to create the full structure. The full code and its result are here http://pastebin.com/ggzeUJzR . And i'm wondering whether this is the way it is generally done. – andrei Oct 27 '14 at 15:12
  • 1
    I think this is the way to go. (Probably in a more structured way. I guess this should be split to multiple methods.) – Gábor Bakos Oct 27 '14 at 15:24

1 Answers1

2

Yes, that is pretty much it. The PMML Standard is an XML specification, so what you're doing is filling out all of the fields for the spec. Usually you would write a procedure that would be called for each similar repetitive subpart of your model, e.g., a node in a Decision Tree.

And, yes it is quite repetitive until you get the structure down.