I have made code that computes the two lines I am asking for in the question, as shown in the image below (desired lines are in red).
EDIT : This is the expected graph using my snippet to generate the ROC curves (atleast I'm pretty sure this is right) :
The problem is that said code is very very ugly (too long to even post here) and the process I came up with seems extremely tedious to me. Yet I can't seem to come up with anything better.
Here is a quick snippet to produce an input list of ROC curves
library(MASS)
library(dplyr)
simple_roc <- function(labels, scores){
labels <- labels[order(scores, decreasing=TRUE)]
return(rbind(c(0,0,0),data.frame(TPR=cumsum(labels)/sum(labels), FPR=cumsum(!labels)/sum(!labels), labels)))
}
diab_data=rbind(data.frame(Pima.tr),data.frame(Pima.te))
roc_curves_list_logisitic=list()
for (k in 1:100) {
#Set a fixed seed for reproducibility
set.seed(k)
# sampled_rows <- createDataPartition(diab_data$type, p = .7, list = FALSE)
sampled_rows <- sample(1:nrow(diab_data), size=floor(0.7*nrow(diab_data)))
diab_data_train=diab_data[sampled_rows,]
diab_data_test=diab_data[-sampled_rows,]
diab_data_train[,1:7]=scale(diab_data_train[,1:7])
diab_data_test[,1:7]=scale(diab_data_test[,1:7])
diab_data_train[,"type"]=as.numeric(as.character(recode_factor(diab_data_train[,"type"],`Yes` = "1", `No` = "0")))
diab_data_test[,"type"]=as.numeric(as.character(recode_factor(diab_data_test[,"type"],`Yes` = "1", `No` = "0")))
logistic_model_simple=glm(data=diab_data_train,as.formula(paste(colnames(diab_data_train)[8], "~",
paste(colnames(diab_data_train)[-8], collapse = "+"),
sep = "")),family=binomial(link = "logit"))
roc_curves_list_logisitic[[k]]=simple_roc(diab_data_test[,"type"],
ifelse(predict(logistic_model_simple,diab_data_test,type='response')>0.5,1,0))
}
I am now asking for help, in case anyone has a "beautiful" solution to produce the two red lines in this graph (in ggplot2) using the list of ROC curves I provided as input.
Preferably I would like to end up with two dataframes lower_bound_roc_curves
and upper_bound_roc_curves
containing the necessary values to plot the two lines seperately if I need them.
Thanks in advance,
EDIT 2 :@denis Here are some parts I think your code gets wrong :