0

I have a quite complex question of how to write an R code, as I went through all the possible options known to me without any results. I can do the first part in excel within less than one hour, i.e., to get a result for the following formula.

C = 1/nnj(cj - aj)^2

n = number of observations, nj = number of observations in the interval, aj = proportion of correct responses for each class interval, ∑ is the sum of each j ratings to class interval.

The reason I want to do it in R is because (1) everything can be done once and with the same program, and--more important--(2) I need do jackknife (or bootstrap) this formula to get the jackknife SEs (which I need for the inferential confidence intervals).

This is the data

> str(data)
'data.frame':   372 obs. of 6 variables:
 $ ID          : int  120 432 664 163 283 659 78 150 158 188 ...
 $ age         : int  20 20 16 18 20 15 20 20 16 20 ...
 $ sex         : Factor w/ 2 levels "female","male": 1 1 1 2 2 2 1 1 1 2 ...
 $ poconf123   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ PoConfPC    : int  40 50 40 30 10 50 50 30 40 30 ...
 $ idacc       : Factor w/ 2 levels "accurate","inaccurate": 1 1 1 1 1 1 2 2 2 2 ...

To get the nj, cj and aj, I tried the following code

group_by(data, poconf123) %>%  #this is to get results for each class intervall
    summarise(poconf = mean(PoConfPC)) %>% #this is to get cj
    summarise(NumAcc=n(), prop = mean(idacc==1)/n())  #this is to get nj and aj

Using the code above I get the following error message

Error in summarise_impl(.data, dots): 
  Evaluation error: object 'PoConfPC' not found. 

The first two lines alone, or the first and the last lines alone do not produce the same error. However, the code

group_by(data, poconf123) %>% 
    summarise(NumAcc=n(), prop = mean(idacc==1)/n())

still seems not optimal, because instead of getting the proportions e.g., .53, I am getting the values "0." in the entire column 'prop'.

Of course, I could ask for a proportion table, e.g.,

prop.table(table(data$idacc))  

to get the proportions for both accurate and inaccurate but this is not really what I need.

So, overall I do not even get to the point where I can run the analyses and get a single C, not to mention the jackknife SE. And this brings me to the actual question, that is, how I can program this formula into a function, so I can 'jackknife' it. Below is an example of how to jackknife the point-biserial correlation with the outputs required for inferential confidence intervals.

library(bootstrap)
xx = data$PoConfPC
yy = data$idacc
nn = length(xx)
theta <- function(x,xx,yy) {biserial.cor(xx[x], yy[x], level = 1)}
results <- jackknife(1:nn, theta, xx, yy)
summary(results$jack.values); results$jack.se; results$jack.bias
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Natalie
  • 11
  • 2
  • Is there a compelling reason to use the `bootstrap`-package? The `boot` package is much more "standard". ALSO: need to supply str(data) at a minimum but it would be much better to supply `dput(head(data,30))` – IRTFM Mar 24 '18 at 04:04
  • When I looked for a package to use the jackknife methods described by Efron (e.g., Efron, 1980; Efron & Tibshirani, 1986) and get theta and SE for the inferential confidence intervals, this was the package that I found. I just looked up and saw that boot is based on the same principle. I am happy to use the one or the other, as long as I can get the values I am after. Ref: Efron & Tibshirani (1986). The bootstrap method for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1, 1-35. – Natalie Mar 24 '18 at 04:21
  • 1
    Use SO [edit] facilities to modify your question. – IRTFM Mar 24 '18 at 04:31
  • 1
    Needed an extra closing paren. Despite the fact that it has a class of data.frame, it does get printed as a dataframe would typically be displayed and when you run `is.data.frame` you get FALSE. I think you might have screwed up your data object. AND PLEASE read my comment again about using the [edit] facilities. Click on the damn thing. Use it. – IRTFM Mar 24 '18 at 05:09
  • I typed it manually and removed a few variables to make it fit into the comment. This is how the mistake came. An excerpt of the data is now in the question. – Natalie Mar 24 '18 at 09:58

0 Answers0