I have the below conceptual problem which I can't get my head around.
Below is an example for survey data where I have a time column that indicates how long someone needs to respond to a certain question.
Now, I'm interested in how the amount of cleaning would change based on this threshold, i.e. what would happen if I increase the threshold, what would happen if I decrease it.
So my idea was to just create a ROC curve (or other model metrics) to have a visual cue about a potential threshold. The problem is that I don't have a machine-learning-like model that would give me class probabilities. So I was wondering if there's any way to create a ROC curve nonetheless with this type of data. I had the idea of just looping through my data at maybe 100 different thresholds, calculate false and true positive rates at each threshold and then do a simple line plot, but I was hoping for a more elegant solution that doesn't require me to loop.
Any ideas?
example data:
time
column indidates the time needed per casetruth
column indicates my current decision I want to compare againstpredicted
column indicates the cleaning decision if I would cut at a time threshold of2.5s
. This is waht I need to change/loop through.
set.seed(3)
df <- data.frame(time = c(2.5 + rnorm(5), 3.5 + rnorm(5)),
truth = rep(c("cleaned", "final"), each = 5)) %>%
mutate(predicted = if_else(time < 2.5, "cleaned", "final"))