How to use current column name in lapply function in R?

Question

I have two data sets: The first data set contains participants' numerical answers to questions:

data <- data.frame(Q1 = 1:5,
                   Q2 = rev(1:5),
                   Q3 = c(4, 5, 1, 2, 3))

The second data set serves as a reference table where the solutions are stored:

ref.table <- data.frame(Question = c("Q1", "Q2", "Q3"),
                        Solution = c("big", "big", "small"))

I would like to compare the two data sets and create a new data set that contains the binary information on whether the answer was correct (1) or incorrect (0). For this, answers 1, 2, 3 correspond to "small", and answers 4, 5 correspond to "big".

My attempt is the following:

accuracy <- data.frame(lapply(data, function(x) {ifelse(x >= 4 & ref.table$Solution[ref.table$Question == colnames(data)[x]] == "big", 1, 0)}))

But somehow, this only gives me the incorrect answers as 0, while the correct answers are NA.

Does anyone know how to solve this? Thank you!

Note I think there is a type-o in your `ref.table` - `Solution` seems like it should be `c("big", "big", small")` — jpsmith, Oct 26 '22 at 14:08

akrun · Accepted Answer · 2022-10-26T14:15:35.753

With tidyverse, loop across the columns, match the column name (cur_column()) with 'Question' column from 'ref.table', get the corresponding 'Solution' value, check if it is 'big' along with the value of the column >= 4 and coerce the logical to binary

library(dplyr)
data %>%
   mutate(across(everything(), ~ +(.x >=4 & 
    ref.table$Solution[match(cur_column(), ref.table$Question)] == 
        "big")))

-output

Or in base R, loop over the column names in lapply, extract the column with [[, the logic applied with match is the same as above

data[] <- lapply(names(data), \(nm) +(data[[nm]] >=4 & 
    ref.table$Solution[match(nm, ref.table$Question)] == "big"))

Thank you for your help! I added an OR argument to make sure that the numbers corresponding to the "small" condition are also accounted for: `data %>% mutate(across(everything(), ~ +((.x >= 4 & ref.table$Solution[match(cur_column(), ref.table$Question)] == "big") | (.x < 4 & ref.table$Solution[match(cur_column(), ref.table$Question)] == "small"))))` — user20291515, Nov 02 '22 at 09:04

How to use current column name in lapply function in R?

1 Answers1