8

I am using the smbinning R package to compute the variables information value included in my dataset.

The function smbinning() is pretty simple and it has to be used as follows:

result = smbinning(df= dataframe, y= "target_variable", x="characteristic_variable", p = 0.05)

So, df is the dataset you want to analyse, y the target variable and x is the variable of which you want to compute the information value statistics; I enumerate all the characteristic variables as z1, z2, ... z417 to be able to use a for loop to mechanize all the analysis process.

I tried to use the following for loop:

for (i in 1:417) {
 result = smbinning(df=DATA, y = "FLAG", x = "DATA[,i]", p=0.05)
  }

in order to be able to compute the information value for each variable corresponding to i column of the dataframe.

The DATA class is "data.frame" while the resultone is "character".

So, my question is how to compute the information value of each variable and store that in the object denominated result?

Thanks! Any help will be appreciated!

QuantumGorilla
  • 583
  • 2
  • 10
  • 25
  • 4
    Perhaps, you need `... x = names(DATA)[i], ...` or `... x = DATA[, i], ...`? `"DATA[, i]"` won't treat "i" differently in each iteration as it is just part of a string. E.g. see `for(i in 1:3) print(paste("var_i"))` VS `for(i in 1:3) print(paste("var_", i, sep = ""))`. And, depending on what "result" is, you'll need something like `result[i]` or `result[i, ]` or `result[, i]`... – alexis_laz Feb 06 '16 at 16:45
  • Hi @alexis_laz and thanks for the comment! Result is a list composed by 7 elements by construction e called result$Cutpoint, result$CntRec, ...; I particularly need of the IV value one can find by typing result$iv and store that. – QuantumGorilla Feb 06 '16 at 17:53

2 Answers2

7

No sample data is provided I can only hazard a guess that the following will work:

results_list = list()    
for (i in 1:417) {
    current_var = paste0('z', i)
    current_result = smbinning(df=DATA, y = "FLAG", x = current_var, p=0.05)
    results_list[i] = current_result$iv
}
Tchotchke
  • 3,061
  • 3
  • 22
  • 37
5

You could try to use one of the apply methods, iterating over the z-counts. The x value to smbinning should be the column name not the column.

results = sapply(paste0("z",1:147), function(foo) {
   smbinning(df=DATA, y = "FLAG", x = foo, p=0.05)
})
class(results) # should be "list"
length(results) # should be 147
names(results) # should be z1,...
results[[1]] # should be the first result, so you can also iterate by indexing

I tried the following, since you had not provided any data

> XX=c("IncomeLevel","TOB","RevAccts01")
> res = sapply(XX, function(z) smbinning(df=chileancredit.train,y="FlagGB",x=z,p=0.05))
Warning message:
NAs introduced by coercion 
> class(res)
[1] "list"
> names(res)
[1] "IncomeLevel" "TOB"         "RevAccts01"
> res$TOB
...

HTH

Dinesh
  • 4,437
  • 5
  • 40
  • 77