Seems like your PMMLPipeline
is indented wrongly and most likely you don't need DataFrameMapper
because it is (according to help page):
DataFrameMapper, a class for mapping pandas data frame columns to
different sklearn transformations
You are not applying the transformations differently, so we don't need that.
Set up an example dataset like:
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn_pandas import DataFrameMapper
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
features = 'ABCDEFGHIJKLMNO'
X = pd.DataFrame(np.random.uniform(0,1,(50,15)),
columns=[i for i in features])
y = np.random.binomial(1,0.5,50)
X_train, X_test,y_train, y_test = train_test_split(X,y,test_size=0.3)
And running the corrected code works ok:
for i in range(0,len(features)):
pipeline = PMMLPipeline([
('pca', PCA(n_components=3)),
('classifier', DecisionTreeClassifier())
])
pipeline.fit(X_train.drop([features[i:i+1]],axis=1),y_train)
result = pipeline.predict(X_test.drop([features[i:i+1]],axis=1))
actual = y_test
print("Dropped feature: {}, Accuracy: {}".format(features[i:i+1],
accuracy_score(actual,result)))
Dropped feature: A, Accuracy: 0.9333333333333333
Dropped feature: B, Accuracy: 0.6
Dropped feature: C, Accuracy: 0.7333333333333333
Dropped feature: D, Accuracy: 0.6
Dropped feature: E, Accuracy: 0.6666666666666666
Dropped feature: F, Accuracy: 0.6666666666666666
Dropped feature: G, Accuracy: 0.6
Dropped feature: H, Accuracy: 0.8
Dropped feature: I, Accuracy: 0.6666666666666666
Dropped feature: J, Accuracy: 0.6666666666666666
Dropped feature: K, Accuracy: 0.7333333333333333
Dropped feature: L, Accuracy: 0.8
Dropped feature: M, Accuracy: 0.6
Dropped feature: N, Accuracy: 0.8
Dropped feature: O, Accuracy: 0.6666666666666666