How to create a new variable within a data frame based on multiple binary variables using R language?

Question

The data frame df has 13 variables as follows,

column 1-id 
column 2-pr_1 （pr_1 to pr_12 are binary variables)
column 3-pr_2
column 4-pr_3
column 5-pr_4
column 6-pr_5
column 7-pr_6
column 8-pr_7
column 9-pr_8
column10-pr_9
column11-pr_10
column12-pr_11
column13-pr_12 
Now, a variable "try" need to be created with 
the following rule within the data frame and 
for each observation,

1)-The value of pr_1 always equals to 1.
2)-If all elements from pr_1 to pr_12 are 1, then try=13
3)-If there is a missing value(NA) between pr_1 to pr_12, then try= NA
4)-If the 1st 0 occurs right after the last 1, for example, the 1st 0 occurs in the variable pr_6 and the last 1 is in the variable pr_5, then the value of "try" should equal to 6 (6=5+1).

In other words, the value of "try" should equal to the repeated times of 1 (no any 0 or NA during the repeated times) in a row plus 1.

The new dataset with the new variable "try" looks like this,

id      pr_1 pr_2 pr_3 pr_4 pr_5 pr_6 pr_7 pr_8 pr_9 pr_10 pr_11 pr_12 try 
j01       1    1    1    1    1   0    0     0   0     0     0    0     6
j02       1    1    1    0    0   0    0     0   0     0     0    0     4
j03       1    0    0    0    0   0    0     0   0     0     0    0     2
j04       1    1    1    1    1   1    1     1   1     1     1    1     13
j05       1    1    1    1    1   1    1     1   NA    1     1    NA    NA
j06       1    1    1    1    1   NA   NA   NA   NA    NA   NA    NA    NA
j07       1    0    NA   0    0   0    0     0   0     0     0    0     NA
j08       1    NA   0    0    0   0    0     0   0     0     0    0     NA
j09       1    NA   0   NA   NA  1    NA   NA   NA   NA     1    1      NA
j10       1    NA   1    1    1   1    1     1   1     1     1    0     NA

The original data set is structured as follows,

structure(list(id = c("j01", "j02", "j03", "j04", "j05", "j06", 
"j07", "j08", "j09", "j10"), pr_1 = c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1), pr_2 = c(1, 1, 0, 1, 1, 1, 0, NA, NA, NA), pr_3 = c(1, 
1, 0, 1, 1, 1, NA, 0, 0, 1), pr_4 = c(1, 0, 0, 1, 1, NA, 0, 
0, NA, 1), pr_5 = c(1, 0, 0, 1, 1, NA, 0, 0, 1, 1), pr_6 = c(0, 
0, 0, 1, 1, NA, 0, 0, NA, 1), pr_7 = c(0, 0, 0, 1, 1, NA, 0, 
0, NA, 1), pr_8 = c(0, 0, 0, 1, 1, NA, 0, 0, NA, 1), pr_9 = c(0, 
0, 0, 1, NA, NA, 0, 0, 1, 1), pr_10 = c(0, 0, 0, 1, 1, NA, 0, 
0, NA, 1), pr_11 = c(0, 0, 0, 1, 1, NA, 0, 0, NA, 1), pr_12 = c(0, 
0, 0, 1, 0, NA, 0, 0, NA, 0)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))->df

score 1 · Accepted Answer · answered Feb 02 '21 at 07:31

You can add rowwise sum of all pr columns.

df$try <- rowSums(df[-1]) + 1

# id       pr_1  pr_2  pr_3  pr_4  pr_5  pr_6  pr_7  pr_8  pr_9 pr_10 pr_11 pr_12   try
#   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 j01       1     1     1     1     1     0     0     0     0     0     0     0     6
# 2 j02       1     1     1     0     0     0     0     0     0     0     0     0     4
3 3 j03       1     0     0     0     0     0     0     0     0     0     0     0     2
# 4 j04       1     1     1     1     1     1     1     1     1     1     1     1    13
# 5 j05       1     1     1     1     1     1     1     1    NA     1     1     0    NA
# 6 j06       1     1     1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
# 7 j07       1     0     0     0     0     0     0     0     0     0     0     0     2
# 8 j08       1    NA     0     0     0     0     0     0     0     0     0     0    NA
# 9 j09       1    NA    NA    NA     1    NA    NA    NA     1    NA    NA    NA    NA
#10 j10       1    NA     1     1     1     1     1     1     1     1     1     0    NA

Or using dplyr :

library(dplyr)
df %>% mutate(try = rowSums(select(., starts_with('pr'))) + 1)

score 0 · Answer 2 · answered Feb 02 '21 at 06:45

I believe this will give you "try" column rowSums(apply(df[, 2:13], 2, function(x) (x == 1))) + 1. The general idea is to check if the element is equal to 1 by column and then sum by row. Note that column pr_3 in the dataset you provided is not identical to the one you showed above. To get the identical results you want, I assume this is a typo and changed it from pr_3 = c(1, 1, 0, 1, 1, 1, NA, 0, NA, 1) to pr_3 = c(1,1, 0, 1, 1, 1, 0, 0, NA, 1).

score 0 · Answer 3 · answered Feb 02 '21 at 07:07

You can use this code too. But I believe you wanted to have NA when there is a pattern such as 101111? Anyway the code below does not work that way and still counts the 1s.

df %>% 
  tidyr::pivot_longer(-id) %>% 
  dplyr::group_by(id) %>% 
  dplyr::mutate(try = sum(value) + 1) %>% 
  dplyr::ungroup() %>% 
  tidyr::pivot_wider(names_from = name, values_from = value)

How to create a new variable within a data frame based on multiple binary variables using R language?

3 Answers3