1

I want to calculate all possible predictions with different probabilities of my data with multiple models. The result is a list.

df<-iris
df$y<-sample(0:1,nrow(df),replace=TRUE)
set.seed(101)
#Now Selecting 80% of data as sample from total 'n' rows of the data  
sample <- sample.int(n = nrow(df), size = floor(.8*nrow(df)), replace = F)
train <- df[sample, ]
test  <- df[-sample, ]

Then i create a logistic model:

 full <- glm(y~., data = train, family = "binomial")
 min <- glm( y~ 1, data = train, family = "binomial")

 backward <- step(full,direction = "backward",trace=0)
 forward <- step(min,scope=list(lower=min, upper=full),direction = "forward",trace=0)
model2<- glm(y~Sepal.Length+Sepal.Width , data = train, family = "binomial")

  models<-list(backward,forward,model2)
prediction<- lapply(models, function(x){predict(x,newdata=test,type="response")})

First of all i have table with predictions. Then i created a vector with all posible probabilities.

    p <- seq(from = 0.1, to = 0.9, by = 0.5) 

Problem is i want to apply differents breaks point. I tried with map2 function of purrr package but it doesn't work.

    pred = map2(prediction,p, function(x,pi){ifelse(x > pi, 1, 0)})

The problem is: Error: .x (3) and .y (2) are different lengths

Anyone can help?

I think is best to change apply to sapply, then i will have a data.frame.

  prediction<- sapply(models, function(x){predict(x, newdata=test,type="response")},
                        simplify = T,USE.NAMES = TRUE)

Then i could use pmap function? thanks

EDIT: I updated with all code.

liguang
  • 161
  • 1
  • 9
  • It would help if you provided an example `test` dataset and an example `models` variable in your example code. – Ramiro Magno Jan 26 '19 at 01:42
  • @plant I updated with code. Thanks – liguang Jan 26 '19 at 12:12
  • `data` is not defined in your example code, so `nrow(data)` gives `NULL`. – Ramiro Magno Jan 26 '19 at 12:16
  • @plant Sorry i forgot change it. You can try now – liguang Jan 26 '19 at 12:49
  • Two things: with the actual code I get no error, and are you sure you want to use `p` instead of `pi` in the function passed to `map2`? – Ramiro Magno Jan 26 '19 at 13:01
  • @plant Yes sorry it is `pi` – liguang Jan 26 '19 at 13:07
  • Are you sure the problem still persists? – Ramiro Magno Jan 26 '19 at 13:14
  • @plant i just added new model for models vector and it doesn't work. – liguang Jan 26 '19 at 13:21
  • 1
    Ok, you have to realise what `map2` is for. `map2` allows you to map a function on an input pair. Each pair is defined from the order in each of the two variables. In your case, the first element of `prediction` is used with the first element of `p`, and so on.. `map2` is not making all pairwise combinations of elements of `prediction` and `p`. Thus, `map2` expects the length of `prediction` and `p` to be the same (or recyclable). Look at `outer` and related functions to achieve what you want. – Ramiro Magno Jan 26 '19 at 13:32
  • @plant i found cross function but i don't know how apply in my example – liguang Jan 26 '19 at 14:54

1 Answers1

1

See if this makes sense:

df<-iris
df$y<-sample(0:1,nrow(df),replace=TRUE)
set.seed(101)
#Now Selecting 80% of data as sample from total 'n' rows of the data  
sample <- sample.int(n = nrow(df), size = floor(.8*nrow(df)), replace = F)
train <- df[sample, ]
test  <- df[-sample, ]

full <- glm(y~., data = train, family = "binomial")
min <- glm( y~ 1, data = train, family = "binomial")

backward <- step(full,direction = "backward",trace=0)
forward <- step(min,scope=list(lower=min, upper=full),direction = "forward",trace=0)
model2<- glm(y~Sepal.Length+Sepal.Width , data = train, family = "binomial")

models<-list(backward,forward,model2)
prediction<- lapply(models, function(x){predict(x,newdata=test,type="response")})

p <- seq(from = 0.1, to = 0.9, by = 0.5) 

combn = cross2(prediction, p)

pred <- map(combn, 
            function(combination) {
              x <- combination[[1]]
              pi <- combination[[2]]
              ifelse(x > pi, 1, 0)
            }
)
Ramiro Magno
  • 3,085
  • 15
  • 30
  • is it possible to return a dataframe? ussing map_dfr. i change function to: pred <- map(combn, function(combination) { x <- combination[[1]] pi <- combination[[2]] y=ifelse(x > pi, 1, 0) exit <- c(pi,y) names(exit ) <- c('pi','y'') return(exit) } ) It is just a example. I would like to add sensibility, error_Rate... – liguang Jan 29 '19 at 13:04
  • What is the meaning of the rows and columns in that dataframe of yours? – Ramiro Magno Jan 29 '19 at 13:54
  • you can see best way in https://stackoverflow.com/questions/54422000/purrr-family-functions-like-ldply?noredirect=1#comment95654574_54422000. We can continue in it – liguang Jan 29 '19 at 14:47