Creating New Variables in R- issues with missing data

Question

I keep encountering a problem in my R code with generating a new variable based upon another variable. Every participant has entries for multiple different variables. Not all of these variables matter for each participant. I have a dummy coded variable which I use to tell me which variable I should use when generating my new variable. Here is what my data would look like.

data
id use v1 v2 v3
1  1   2  2  1  
2  2   NA 1  2 
3  3   1  NA 3
4  1   3  5  NA
5  2   4  4  1

I will try to create a new variable using the dummy coded variable. For this example, is use is 1, I want to use the value of v1 for x. If use is 2, then I want to use v2 for x. If use is 3, I want to use v3 for x. Here is the code I use.

data$x [data$use == 1] <- data$v1
data$x [data$use == 2] <- data$v2
data$x [data$use == 3] <- data$v3

When I try to run the code, I will then get the error message saying "number of items to replace is not a multiple of replacement length".

I did some research and I think this has something to do with data being missing (though I could be wrong). I tried to use is.na () within the [] but this does not solve the issue.

I have used ifelse to solve problems similar to this before, but I don't think that code would work in this circumstance because I have more than two situations (I am not sure if ifelse is cumulative or not).

Why does this error occur and what is the best way to resolve this?

thelatemail · Accepted Answer · 2021-05-19T21:57:25.037

Your issue is that your left and right hand sides of the <- assignment are different lengths.

## data$x[data$use == 1] <- data$v1

data$x[data$use == 1]
#[1] 2 3

data$v1
#[1]  2 NA  1  3  4

If you match them up by selecting on both sides, you're laughing:

data$x[data$use == 1] <- data$v1[data$use == 1]
data$x[data$use == 2] <- data$v2[data$use == 2]
data$x[data$use == 3] <- data$v3[data$use == 3]

#  id use v1 v2 v3 x
#1  1   1  2  2  1 2
#2  2   2 NA  1  2 1
#3  3   3  1 NA  3 3
#4  4   1  3  5 NA 3
#5  5   2  4  4  1 4

You can avoid needing to write multiple lines and make this work for any number of variables using matrix indexing however, as per this previous answer of mine: https://stackoverflow.com/a/33862219/496803

data[c("v1","v2","v3")][cbind(seq_len(nrow(data)), data$use)]
#[1] 2 1 3 3 4

This essentially uses a matrix with a row and column index to grab the right value from the v1-3 variables:

cbind(seq_len(nrow(data)), data$use)
##    row  col
#     [,1] [,2]
#[1,]    1    1
#[2,]    2    2
#[3,]    3    3
#[4,]    4    1
#[5,]    5    2


## assign it get the same result obviously:
data$x <- data[c("v1","v2","v3")][cbind(seq_len(nrow(data)), data$use)]

Thank you. This code has been extremely helpful. I have used this solution in a lot of my other projects. — Eric Boorman, Dec 07 '21 at 18:23

score 0 · Answer 2 · answered May 19 '21 at 21:41

You can try the code below

v <- c("v1", "v2", "v3")
list2env(
  setNames(
    lapply(v, function(x) data[[x]][data$use == gsub("\\D", "", x)]),
    v
  ),
  envir = .GlobalEnv
)

and you can check it by

> mget(ls(pattern = "v\\d+"))
$v1
[1] 2 3

$v2
[1] 1 4

$v3
[1] 3

Creating New Variables in R- issues with missing data

2 Answers2