Can someone explain what the controls
and cases
arguments mean in the roc()
function from the pROC package in R, and how to use them? How to check the number of controls and cases available in the dataset?

- 7,510
- 4
- 39
- 61

- 47
- 1
- 7
3 Answers
From help(roc)
:
controls, cases instead of response, predictor, the data can be supplied as two numeric or ordered vectors containing the predictor values for control and case observations.
Usually the roc curve is used in classificaiton settings, where you have two vector of labeled classes (factor()
in R), one is your predicted labels, and the other is the truth, again each obs is labeled.
Other times you can have a control group (like in medicine scenarios), and you can give the function either controls (a numeric vector) or cases (a factor vector).
The control group is basically the part of population where you don't give the treatment.
Again from the help
function:
Data can be provided as response, predictor, where the predictor is the numeric (or ordered) level of the evaluated signal, and the response encodes the observation class (control or case). The level argument specifies which response level must be taken as controls (first value of level) or cases (second). It can safely be ignored when the response is encoded as 0 and 1, but it will frequently fail otherwise. By default, the first two values of levels(as.factor(response)) are taken, and the remaining levels are ignored. This means that if your response is coded “control” and “case”, the levels will be inverted.
In some cases, it is more convenient to pass the data as controls, cases, but both arguments are ignored if response, predictor was specified to non-NULL values. It is also possible to pass density data with density.controls, density.cases, which will result in a smoothed ROC curve even if smooth=FALSE, but are ignored if response, predictor or controls, cases are provided.
data(aSAH)
# With numeric controls/cases
roc(controls=aSAH$s100b[aSAH$outcome=="Good"], cases=aSAH$s100b[aSAH$outcome=="Poor"])
# With ordered controls/cases
roc(controls=aSAH$wfns[aSAH$outcome=="Good"], cases=aSAH$wfns[aSAH$outcome=="Poor"])

- 8,144
- 3
- 21
- 37
In binary classification, you always have two groups. One of these groups will correspond to the observations with the thing you want to detect. Depending on your field of research it can be called several ways, but common terms include hit, positive, or case.
By contrast, observations which don't have what you want to detect are labeled negative, miss or control.
So in pROC this is called control and case, but you can think about it as negative and positive, respectively.
You don't need to check the number of controls and cases available. pROC will do this check for you, and the numbers that were actually used will be reported when you print
the curve.

- 7,510
- 4
- 39
- 61
is it about a particular case or about the functioning of ROC curves in general? If you are stuck in R in a ROC curve project, please write down the code. It's easier to explain this when you give an example.
-
1Please add a comment underneath the question if you don't answer. – Calimo Jul 14 '18 at 16:24