0

(R, dplyr) I'm trying input student MCQ answers, and correct answers, and compute marks; many questions have multiple answers. I've input the mark-sheet answers (ms) and student responses (sr) as dataframes; column vectors are lists. (Update: question labels are row names in ms.)

Trying best way to compare the rows of each column of sr to the relevant answers column of ms to find: 1. Number of correct answers chosen and 2. Whether it's an exact match.

I can compare individual elements with length(intersect(sr[[1,2]], ms[[3,2]])) commands but I can't figure out how to scale this up ... with iteration or with purr, mutate_at, etc.

#Sample of the data frame

#Input
ms <- tribble( #mark scheme
  ~q_label, ~point_value,   ~d_partial_credit,  ~c_ans,
  "questions1", 5,  1,  "B,E",
  "questions2", 4,  0,  "C",
  "questions3", 4,  1,  "C,E"
)

ms <- ms %>% remove_rownames %>% column_to_rownames(var="names")


sr <- tribble( #student responses 
~id,    ~questions1,    ~questions2,    ~questions3,
"1", "A,E", "C", "C,B",
"2",    "A,D,E","B","C,E",
"3",    "E",    "C",    "A,B,C",
"4", "E",   "C",    "C"
)

Convert answer columns into "list of characters" recursive vector thing: ms$c_ans <- lapply(strsplit(as.character(ms$c_ans),split=','),trimws)

makeitlist <- function(x) (lapply(strsplit(as.character(x),split=','),trimws))
for (i in 2:length(sr)) {          
  sr[[i]] <- makeitlist(sr[[i]]) 
}

Now I want to create columns in sr... sr$num_correct1, sr$num_correct2, etc. to indicate the student got an exact match, and others to indicate number of correct answers chosen...

E.g., number of elements in student's row of sr$questions2 that are also in questions2 row of ms$c_ans, i.e., in ms[[2,4]]

I think the problem breaks down into two pieces:

  1. sr$q1_full <- sr[[2]]==ms[[1,3]]

Yields a nice boolean vector but the results are surprising (and not correct as I intended'. Why FALSE FALSE FALSE TRUE? ... should be all false. Also how can I automate it to do this over all rows/columns?

  1. length(intersect(sr[[1,2]], ms[[1,3]]))

Works for individual elements, but how can I get it to do this comparing entire vectors such as sr[2] to ms[[1,3]]?

daaronr
  • 507
  • 1
  • 4
  • 12
  • 1
    This is not exactly what you are asking for but it may still be of interest to you: In the `exams` package we have a small convenience function `exams_eval()` that implements various evaluation strategies for multiple-choice and single-choice questions. See `example("exams_eval")` for some illustrations. Maybe that could be reused here. – Achim Zeileis Nov 03 '19 at 20:26

1 Answers1

0

I've found a solution to the main problem; it's not elegant though, so I would appreciate other suggestions:

Compare elements of a single vector of responses sr$questions1 to the corresponding element of ms, i.e., ms[[c("questions1"),"c_ans"]] and iterate it over all questions and over all students

for(q in rownames(ms)) {   #iterate over all questions 
  for (i in 1:NROW(sr)) {  #over all students
    sr[[glue("cr_ans_",q)]][[i]] <- length(intersect(sr[[q]][[i]], ms[[c(q),"c_ans"]])) 
  }
} 

The crucial issue was comparing elements of lists to one another rather than comparing lists themselves to elements or to one another (not even lists of a single element). the [[]] is vital.

'glue' also helped (though it's inelegant) and realising I could iterate over the rownames.

daaronr
  • 507
  • 1
  • 4
  • 12
  • Got the whole thing to work ... shared on github [repo HERE](https://github.com/daaronr/dr-rstuff/blob/master/marking-in-r/marking_code.Rmd) But I’m not happy with it. - My decision to use dataframes with ‘list columns’ (each element in the column is a list of characters) is very awkward and slow. - … I end up pasting in a lot of columns from the ‘marking sheet’ (`ms`) into the student responses sheet (`sr`) that leads to a lot of redundancy (untidiness) - I think I should have used multidimensional arrays instead. Any thoughts? – daaronr Jan 16 '20 at 15:50