I have data like the below:
# Create fake data frame
score <- rep(seq(1:3), 2)
id <- rep(c(2014, 2015), each = 3)
var_if_1 <- rep(c(0.1, 0.8), each = 3)
var_if_2 <- rep(c(0.9, 0.7), each = 3)
var_if_3 <- rep(c(0.6, 0.2), each = 3)
data.frame(score, id, var_if_1, var_if_2, var_if_3)
More specifically, each row is uniquely defined by two vectors in a data frame (e.g. score and id) and there are a multitude of additional columns that begin with a string (e.g. "var_if_") and end with a different number (e.g. 1,2,3). Furthermore, for a given value of score (i.e. for any row with a given score) the value of the additional variables does not vary.
I am trying to convert these data into a data frame like the below:
# Desired output data frame
score <- rep(seq(1:3), 2)
id <- rep(c(2014, 2015), each = 3)
var <- c(0.1, 0.9, 0.6, 0.8, 0.7, 0.2)
data.frame(score, id, var)
More specifically, the additional variables (var_if_#) are removed and aggregated into a single new variable (e.g. var) which takes on the value of one of the additional variable columns based on the value of score. For example, if score == 2, then var == var_if_2.
Constrains on the solution
- Looking to use base R or dplyr().
- Looking for a solution that generalizes to a large number of values of 'score' and corresponding columns for 'var_if_#' and rows of arbitrary ordering.
The below exemplifies the arbitrary row ordering.
score <- rep(seq(1:3), 2)
id <- rep(c(2014, 2015), each = 3)
var_if_1 <- rep(c(0.1, 0.8), each = 3)
var_if_2 <- rep(c(0.9, 0.7), each = 3)
var_if_3 <- rep(c(0.6, 0.2), each = 3)
foo <- data.frame(score, id, var_if_1, var_if_2, var_if_3)
foo[sample(1:nrow(foo)), ] # arbitrary row order
I am also aware that I could just use ifelse() but this becomes tedious with many possible values of score (unless there is a looping approach that can reduce the tedium).