There are two approaches to exporting Apache Spark models into PMML data format. First, when working at Spark ML abstraction level, then you can use the JPMML-SparkML library. Second, when working at Spark MLlib abstraction level, which appears to be the case here, then you can use the built-in PMMLExportable
trait.
JPMML-SparkML retrieves column names from the Spark ML data schema via DataFrame#schema()
. Unfortunately, there is no such option for Spark MLlib, so feature names "field_{n}" and the label name "target" are simply dummy hard-coded names.
It is fairly easy to rename fields in the PMML document using the JPMML-Model library:
pmmlExportable.toPMML("/tmp/raw-pmml-file")
org.dmg.pmml.PMML pmml = org.jpmml.model.JAXBUtil.unmarshal("/tmp/raw-pmml-file");
org.jpmml.model.visitors.FieldRenamer targetRenamer = new FieldRenamer(FieldName.create("target"), FieldRenamer.create("y"));
targetRenamer.applyTo(pmml);
org.jpmml.model.JAXBUtil.marshal(pmml, "/tmp/final-pmml-file");
If you marshal this PMML object instance to a PMML file, then you can see that the field "target" (and all its references) has been renamed to "y". Repeat the procedure with features.