0

I recently came across sklearn2pmml and jpmml-sklearn when looking for a way to convert scikit-learn models to PMML. However, I've been hitting errors when trying to use the basic usage examples that I'm unable to figure out.

When attempting to usage example in sklearn2pmml, I've been receiving the following issue around casting a long as an int:

Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
    at numpy.core.NDArrayUtil.getShape(NDArrayUtil.java:66)
    at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:92)
    at org.jpmml.sklearn.ClassDictUtil.getShape(ClassDictUtil.java:76)
    at sklearn.linear_model.BaseLinearClassifier.getCoefShape(BaseLinearClassifier.java:144)
    at sklearn.linear_model.BaseLinearClassifier.getNumberOfFeatures(BaseLinearClassifier.java:56)
    at sklearn.Classifier.createSchema(Classifier.java:50)
    at org.jpmml.sklearn.Main.run(Main.java:104)
    at org.jpmml.sklearn.Main.main(Main.java:87)
Traceback (most recent call last):
  File "C:\Users\user\workspace\sklearn_pmml\test.py", line 40, in <module>
    sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
  File "C:\Python27\lib\site-packages\sklearn2pmml\__init__.py", line 49, in sklearn2pmml
    os.remove(dump)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\user\\appdata\\local\\temp\\tmpmxyp2y.pkl'

Any suggestions as to what is going on here?

Usage code:

#
# Step 1: feature engineering
#

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

import pandas
import sklearn_pandas

iris = load_iris()

iris_df = pandas.concat((pandas.DataFrame(iris.data[:, :], columns = ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]), pandas.DataFrame(iris.target, columns = ["Species"])), axis = 1)

iris_mapper = sklearn_pandas.DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], PCA(n_components = 3)),
    ("Species", None)
])

iris = iris_mapper.fit_transform(iris_df)

#
# Step 2: training a logistic regression model
#

from sklearn.linear_model import LogisticRegressionCV

iris_X = iris[:, 0:3]
iris_y = iris[:, 3]

iris_classifier = LogisticRegressionCV()
iris_classifier.fit(iris_X, iris_y)

#
# Step 3: conversion to PMML
#

from sklearn2pmml import sklearn2pmml

sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")

EDIT 12/6: After the new update, the same issue comes up farther down the line:

Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Updating 1 target field and 3 active field(s)
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Mapping target field y to Species
Dec 06, 2015 5:56:49 PM sklearn_pandas.DataFrameMapper updatePMML
INFO: Mapping active field(s) [x1, x2, x3] to [Sepal.Length, Sepal.Width, Petal.Length, Petal.Width]
Traceback (most recent call last):
  File "C:\Users\user\workspace\sklearn_pmml\test.py", line 40, in <module>
    sklearn2pmml(iris_classifier, iris_mapper, "LogisticRegressionIris.pmml")
  File "C:\Python27\lib\site-packages\sklearn2pmml\__init__.py", line 49, in sklearn2pmml
    os.remove(dump)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\user\\appdata\\local\\temp\\tmpqeblat.pkl'
Noit
  • 23
  • 4

1 Answers1

0

JPMML-SkLearn expected that ndarray.shape is tuple of i4 (mapped to java.lang.Integer by the Pyrolite library). However, in this case it was a tuple of i8 (mapped to java.lang.Long). Hence the cast exception.

This issue has been addressed in JPMML-SkLearn commit f7c16ac2fb.

If you should encounter another exception (data translation between platforms could be tricky), then you should also open a JPMML-SkLearn issue about it.

user1808924
  • 4,563
  • 2
  • 17
  • 20
  • The script ran farther down the line this time but a similar issue came up. Is that a similar fix later down or should I open an issue with JPMML-Sklearn? – Noit Dec 07 '15 at 02:05
  • Please open an issue with JPMML-SkLearn and post your exception stack trace there. The project in known to correctly build and run on 64 bit Linux, but you appear to be using (64bit?) Windows. Looks like these two platforms default to different data types somewhere in the Scikit-Learn/Numpy/Python stack. – user1808924 Dec 07 '15 at 08:26