How to evaluate the PMML file using python

Question

I have pmml file generated by python having random forest classifier, I need to test the model again in python. Kindly let me know how to import the pmml file back to python so that I can test the model using new dataset.

I have tried using titanium package but it went to error because of the version issue of PMML.

The expected output to be the predicted value of the model so that I can verify the accuracy of the model.

PredictFuture · Answer 1 · 2019-07-25T06:31:57.230

0

You could use PyPMML to load PMML in Python, then make predictions on new dataset, e.g.

from pypmml import Model

model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)

The data could be dict, string in JSON, Series or DataFrame of Pandas.

edited Jul 25 '19 at 06:31

answered Jul 22 '19 at 15:02

PredictFuture

216
2
6

Thanks for the recommendation!! I am still facing the issue, I have 3000 test data points stored in the Dataframe when I am predicting the points using pmml file, I am getting the same prediction values for all of the points. – jay Jul 22 '19 at 19:14
This issue could be caused by several points: PMML model itself, input data, or PyPMML. Firstly, you need to try to make predictions using your native python model, check if still the same prediction values. If it's true, you should check the model training process. Then, check your new dataset if match the PMML spec, open the PMML file by a text editor, you will see there are several DataField in DataDictionary. Check if the dataset contains all fields. If both parts have no problem above, you are free to open an issue of PyPMML in Github. – PredictFuture Jul 23 '19 at 01:41
I am finally able to get the predictions from PyPMML file. Thankyou!! I have one doubt, My input file or python model contains 33 variables but in the PMML file I have only 27 variables as an input variable to make the prediction. So I want to know whether PMML file only takes the best variables( significant) out of 33 variables? – jay Jul 23 '19 at 14:36
Yes. Actually, that is the model training process could involve features selection. PMML can support such scenarios that only take those active fields into the prediction computations. – PredictFuture Jul 24 '19 at 01:45
Thank you so much for your help!! – jay Jul 24 '19 at 14:38

How to evaluate the PMML file using python

1 Answers1