0

When I try to get the PMML code out of my models in R I get the following error:

Error in datypelist[[namelist[ndf2][[1]]]] : subscript out of bounds

Here is the code which gives error:

dim(train)
[1] 6963   31
model <- glm(trainLabels ~.,family=binomial(logit),data=train)
summary(model)
# export as PMML
library(pmml)
glm.pmml <- pmml(model)
Error in datypelist[[namelist[ndf2][[1]]]] : subscript out of bounds

Here is the sample code that doesn't give any error:

library(nnet)
library(pmml)
data(iris)
multinom = multinom(Species ~ Sepal.Length + Sepal.Width 
                    + Petal.Length + Petal.Width, data = iris)
pm<-pmml(multinom)
pm # returns an xml output in the console

Where am I doing it wrong? Is the error caused due to the data size? Please help

UPDATE:

After watching @Laterow and @Tridi suggestions I thought I should look at the train set and I see a character vector in it,

str(train)
'data.frame':   6963 obs. of  31 variables:
$ YearOfBirth_Grouped              : chr  "3_1942-51" "4_1952-61" "7_ >=1982" "4_1952-61" ...
$ Avg_BMI_Transcript               : num  26 21 22 26 28 30 30 25 21 20 ...
$ Avg_Temperature_Transcript       : num  98.1 96.8 98.3 97.7 98.4 97.6 0 95 97 97.5 ...
$ Gender                           : Factor w/ 2 levels "F","M": 2 1 1 1 2 2 1 2 1 1 ...
$ trainLabels                      : int  1 0 0 0 0 0 0 0 0 0 ...

Then I converted that character vector into integer. Then pmml worked fine,

train$YearOfBirth_Grouped <- as.factor(train$YearOfBirth_Grouped)enter code here
str(train)
'data.frame':   6963 obs. of  31 variables:
$ YearOfBirth_Grouped              : Factor w/ 7 levels "1_<=1931","2_1932-41",..: 3 4 7 4 7 6 7 3 3 4 ...
$ Avg_BMI_Transcript               : num  26 21 22 26 28 30 30 25 21 20 ...
$ Avg_Temperature_Transcript       : num  98.1 96.8 98.3 97.7 98.4 97.6 0 95 97 97.5 ...
$ Gender                           : Factor w/ 2 levels "F","M": 2 1 1 1 2 2 1 2 1 1 ...
$ trainLabels                      : int  1 0 0 0 0 0 0 0 0 0 ...
SoakingHummer
  • 562
  • 1
  • 7
  • 25
  • Alright, it's pretty hard to do this without any data, but could you provide `str(train)`? Also, if it's possible, could you provide a reproducible example, i.e. provide some minimal data with which you can reproduce the error? I tried reproducing the error myself with other data, but `pmml` worked just fine – slamballais Feb 26 '16 at 15:09
  • Thanks @Laterow and Tridi, I have updated the data format. Sorry, my bad for posting such a simple problem in this community. – SoakingHummer Mar 03 '16 at 08:45
  • Should I delete this question or should it stay because I think its so simple. Please suggest. – SoakingHummer Mar 08 '16 at 05:56
  • Hi @Nag, you can leave it as it is already pretty far down the list, so it's not in the way of anything. – slamballais Mar 08 '16 at 11:06

1 Answers1

1

Yes, I have encountered the same error message in conversion of R script for a predictive model which includes also text mining code - to PMML.

I have solved the problem by conversion of my "characters" features to factors (by factorization of characters). It worked, the PMML error disappeared and I was able to generate the PMML successfully.

Uwe
  • 41,420
  • 11
  • 90
  • 134