I have a quite complex question of how to write an R code, as I went through all the possible options known to me without any results. I can do the first part in excel within less than one hour, i.e., to get a result for the following formula.
C = 1/n ∑ nj(cj - aj)^2
n = number of observations, nj = number of observations in the interval, aj = proportion of correct responses for each class interval, ∑ is the sum of each j ratings to class interval.
The reason I want to do it in R is because (1) everything can be done once and with the same program, and--more important--(2) I need do jackknife (or bootstrap) this formula to get the jackknife SEs (which I need for the inferential confidence intervals).
This is the data
> str(data)
'data.frame': 372 obs. of 6 variables:
$ ID : int 120 432 664 163 283 659 78 150 158 188 ...
$ age : int 20 20 16 18 20 15 20 20 16 20 ...
$ sex : Factor w/ 2 levels "female","male": 1 1 1 2 2 2 1 1 1 2 ...
$ poconf123 : int 1 1 1 1 1 1 1 1 1 1 ...
$ PoConfPC : int 40 50 40 30 10 50 50 30 40 30 ...
$ idacc : Factor w/ 2 levels "accurate","inaccurate": 1 1 1 1 1 1 2 2 2 2 ...
To get the nj, cj and aj, I tried the following code
group_by(data, poconf123) %>% #this is to get results for each class intervall
summarise(poconf = mean(PoConfPC)) %>% #this is to get cj
summarise(NumAcc=n(), prop = mean(idacc==1)/n()) #this is to get nj and aj
Using the code above I get the following error message
Error in summarise_impl(.data, dots):
Evaluation error: object 'PoConfPC' not found.
The first two lines alone, or the first and the last lines alone do not produce the same error. However, the code
group_by(data, poconf123) %>%
summarise(NumAcc=n(), prop = mean(idacc==1)/n())
still seems not optimal, because instead of getting the proportions e.g., .53, I am getting the values "0." in the entire column 'prop'.
Of course, I could ask for a proportion table, e.g.,
prop.table(table(data$idacc))
to get the proportions for both accurate and inaccurate but this is not really what I need.
So, overall I do not even get to the point where I can run the analyses and get a single C, not to mention the jackknife SE. And this brings me to the actual question, that is, how I can program this formula into a function, so I can 'jackknife' it. Below is an example of how to jackknife the point-biserial correlation with the outputs required for inferential confidence intervals.
library(bootstrap)
xx = data$PoConfPC
yy = data$idacc
nn = length(xx)
theta <- function(x,xx,yy) {biserial.cor(xx[x], yy[x], level = 1)}
results <- jackknife(1:nn, theta, xx, yy)
summary(results$jack.values); results$jack.se; results$jack.bias