0

This runs fine when I specify everything, but just trying to generalize it a bit with "score" and "outcome" and it fails (see the end). Any idea how to do this? (I have the indices thing because I want to bootstrap this later)

library(PRROC)
df <- iris %>% filter(Species != "virginica") %>% mutate(outcome_versi = ifelse(Species == "versicolor", 1, 0)) %>% select(Sepal.Length, outcome_versi)

#Iris single AUC
fc <- function(data, indices){
  d <- data[indices,]
  versi.y <- d %>% filter(outcome_versi == 1) %>% select(Sepal.Length)
  versi.n <- d %>% filter(outcome_versi == 0)%>% select(Sepal.Length)
  prroc.sepal.length <-pr.curve(scores.class0 = versi.y$Sepal.Length, scores.class1 = versi.n$Sepal.Length, curve=T)
  return(prroc.sepal.length$auc.integral)
}

fc(df)
#AUC = 0.94

#Iris single AUC - functionalized
fcf <- function(score, outcome, data, indices){
  d <- data[indices,]
  test.pos <- d %>% filter(outcome==1) %>% select(score)
  test.neg <- d %>% filter(outcome==0) %>% select(score)
  prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
  return(prroc.test$auc.integral)
}

fcf(data=df, score=Sepal.Length, outcome = outcome_versi)
#Error: 'outcome' not found```
CineyEveryday
  • 127
  • 1
  • 8
  • 1
    This is a non standard evaluation [NSE](https://dplyr.tidyverse.org/articles/programming.html) problem. – Limey Feb 03 '22 at 14:03
  • Can you elaborate, where in the code is the NSE problem? – CineyEveryday Feb 03 '22 at 14:08
  • 1
    Within `filter(outcome==1) %>% select(score)` the outcome and score bits – Quixotic22 Feb 03 '22 at 14:24
  • got it. so, is it not possible to filter and select data with dplyr as part of a function? Should I use base R instead? – CineyEveryday Feb 03 '22 at 15:12
  • 1
    It is possible. But you need to learn the techniques you need to do it. The link I provided gives you a relevant tutorial. It would also be helpful if you provided a *minimal reproducible example*, including both input data and expected output. [This post](https://stackoverflow.com/help/minimal-reproducible-example) may help. – Limey Feb 03 '22 at 15:33

2 Answers2

1

As I mentioned yesterday, this is a standard NSE problem, which is [almost] always encountered when programming in the tidyverse. The problem is caused by the fact that tidyverse allows you to write, for example,

iris %>% filter(Sepal.Length < 6)

All other things being equal, at the time the function is called, the object Sepal.Length does not exist, but no error is thrown and the code works "as expected".

Here's how I deal with this in your situation. Note that I have removed the condition parameter to the function, because I feel this is more naturally handled by a call to filter earlier in the pipe and I have moved data/d to be the first parameter of the function so that it fits more naturally into a pipe.

Also, I don't have the PRROC package, so have commened out the call to it inside the function, and replaced the original return value accordingly. Simply make the obvious changes to get the functionality you need. The solution to the NSE issue does not depend on access to PRROC.

library(magrittr)
library(dplyr)

fcf <- function(d, score=Sepal.Length, outcome = outcome_versi){
  qScore <- enquo(score)
  qOutcome <- enquo(outcome)

  test.pos <- d %>% filter(!! qOutcome == 1) %>% select(!! qScore)
  test.neg <- d %>% filter(!! qOutcome == 0) %>% select(!! qScore)
  # prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
  # return(prroc.test$auc.integral)
  return(list("pos"=test.pos, "neg"=test.neg))
}

# as_tibble simply to improve formatting
as_tibble(iris) %>% 
  mutate(outcome_versi = ifelse(Species == "versicolor", 1, 0)) %>% 
  fcf()

$pos
# A tibble: 50 × 1
   Sepal.Length
          <dbl>
 1          7  
 2          6.4
 3          6.9
 4          5.5
 5          6.5
 6          5.7
 7          6.3
 8          4.9
 9          6.6
10          5.2
# … with 40 more rows

$neg
# A tibble: 100 × 1
   Sepal.Length
          <dbl>
 1          5.1
 2          4.9
 3          4.7
 4          4.6
 5          5  
 6          5.4
 7          4.6
 8          5  
 9          4.4
10          4.9
# … with 90 more rows

And similarly,

set.seed(123)
as_tibble(iris) %>% 
   mutate(
    outcome_versi = ifelse(Species == "versicolor", 1, 0),
    RandomOutcome=runif(nrow(.)) > 0.5
  ) %>% 
  filter(Sepal.Length < 6) %>% 
  fcf(score=Petal.Width, outcome=RandomOutcome)

$pos
# A tibble: 40 × 1
   Petal.Width
         <dbl>
 1         0.2
 2         0.2
 3         0.2
 4         0.3
 5         0.2
 6         0.2
 7         0.2
 8         0.1
 9         0.1
10         0.4
# … with 30 more rows

$neg
# A tibble: 43 × 1
   Petal.Width
         <dbl>
 1         0.2
 2         0.2
 3         0.4
 4         0.1
 5         0.2
 6         0.2
 7         0.4
 8         0.3
 9         0.3
10         0.2
# … with 33 more rows

Finally, if you want to use an enquoted variable on the left hand side of an assignment, then you need to use := rather than =.

Limey
  • 10,234
  • 2
  • 12
  • 32
0

Does this work?

fcf <- function(score, outcome, data, indices){
  d <- data[indices,]
  test.pos <- d %>% filter(outcome==1) %>% select(all_of(score))
  test.neg <- d %>% filter(outcome==0) %>% select(all_of(score))
  prroc.test <-pr.curve(scores.class0 = test.pos$score, scores.class1 = test.neg$score, curve=T)
  return(prroc.test$auc.integral)
}

fcf(data=df, score='Sepal.Length', outcome = 'outcome_versi')

I don't have the required package to test. But I assume it's because you've asked for a column in the df but that isn't a variable by itself.

N.B. if you have an older version of dplyr you might need to make use of rlang quasiquotation

Quixotic22
  • 2,894
  • 1
  • 6
  • 14