5

I'm trying to save a Pipeline object as a PMML and Python throws a RuntimeError.

My Python version is 3.6, sklearn2pmml version is 0.44.0 and JDK version is 1.8.0_201.

All these match the package's prerequisites.

Here's what I have done so far. (I'm not including the data loading and cleaning part)

from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import make_pmml_pipeline, sklearn2pmml

logit_pipline = Pipeline([('vect', CountVectorizer(ngram_range=(1,2))), ('tfidf', TfidfTransformer(use_idf=True)), ('clf', LogisticRegression(C=11.3))])
pmml_pipeline = PMMLPipeline([("logit", logit_pipline)])
pmml_pipeline.fit(X, Y)

sklearn2pmml(pmml_pipeline, 'logit.pmml', with_repr=True)

What's happening after I run the last line mentioned above is...

sklearn2pmml(pmml_pipeline, 'logit.pmml', with_repr=True)
Standard output is empty
Standard error:
Apr 30, 2019 11:59:04 AM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Apr 30, 2019 11:59:04 AM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 230 ms.
Apr 30, 2019 11:59:04 AM org.jpmml.sklearn.Main run
INFO: Converting..
Apr 30, 2019 11:59:04 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Expected an estimator object as the last step, got a transformer object (Python class sklearn.pipeline.Pipeline)
        at sklearn2pmml.pipeline.PMMLPipeline.getEstimator(PMMLPipeline.java:541)
        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:93)
        at org.jpmml.sklearn.Main.run(Main.java:145)
        at org.jpmml.sklearn.Main.main(Main.java:94)

Exception in thread "main" java.lang.IllegalArgumentException: Expected an estimator object as the last step, got a transformer object (Python class sklearn.pipeline.Pipeline)
        at sklearn2pmml.pipeline.PMMLPipeline.getEstimator(PMMLPipeline.java:541)
        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:93)
        at org.jpmml.sklearn.Main.run(Main.java:145)
        at org.jpmml.sklearn.Main.main(Main.java:94)

Traceback (most recent call last):

  File "<ipython-input-129-f5c307b4aaba>", line 1, in <module>
    sklearn2pmml(pmml_pipeline, 'logit.pmml', with_repr=True)

  File "C:\ProgramData\Anaconda3\lib\site-packages\sklearn2pmml\__init__.py", line 252, in sklearn2pmml
    raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")

RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

Now according to some people, this is some JDK compatibility issue, and using JDK versions 1.9 and above or 1.6 and below throws this kind of issues. But since my JDK version is acceptable to sklearn2pmml, why is this kind of error coming up?

dilip sundar
  • 91
  • 1
  • 7
  • Very nice question for a newbie! Hope someone comes by and sheds some light on it. – GhostCat Apr 30 '19 at 07:18
  • Without knowing your "logit.pmml" file an answer is not very likely. I think there is missing some configuration/step in this file. – SirFartALot Apr 30 '19 at 08:47

1 Answers1

0

As the underlying Java exception tells, the sklearn2pmml.pipeline.PMMLPipeline class expects to be parameterized with a list of steps, where the final step holds some estimator object. In your case, you are parameterizing PMMLPipeline with a single-element list of steps; the final step holds a Pipeline object, which is not an estimator object in this sense.

To fix the problem, simply get rid of the intermediate logit_pipline layer (what's the idea of wrapping a pipeline inside a pipeline anyway?).

For example, this would work:

logit_pipline = PMMLPipeline([..])
logit_pipeline.fit(X, y)
sklearn2pmml(logit_pipeline, "logit.pmml")

This problem is completely unrelated to JDK, Python or Scikit-Learn version.

user1808924
  • 4,563
  • 2
  • 17
  • 20
  • I'm getting the same error even after making the changes you suggested. – dilip sundar Apr 30 '19 at 12:09
  • So, you still have one pipeline nested inside another? Simply get rid of one of those two pipelines (`logit_pipline` or `pmml_pipeline`), and change the type of the "survivor" to `PMMLPipeline`. – user1808924 Apr 30 '19 at 12:49
  • I got rid of the nested pipeline and tried to run the code. I'm still getting the same error. – dilip sundar May 02 '19 at 05:32
  • It cannot be the same error, because this error is about one pipeline nested inside the other pipeline. It's either a different error, or you still have two pipelines. Show your code. – user1808924 May 02 '19 at 07:39
  • Yes you're right. My bad. I'm getting an error related to the CountVectorizer. logit_pipeline = PMMLPipeline([('vect', CountVectorizer(ngram_range=(1,2))), ('tfidf', TfidfTransformer(use_idf=True)), ('clf', LogisticRegression(C=11.3))]) logit_pipeline.fit(X, y) sklearn2pmml(logit_pipeline, 'logit.pmml') SEVERE: Failed to convert java.lang.IllegalArgumentException: Attribute 'sklearn.feature_extraction.text.CountVectorizer.tokenizer' has a missing (None/null) value – dilip sundar May 02 '19 at 08:37