0

Hi I have 10 variables with the same ending and I am trying to use mutate_at to create a new variable based off of data in those variables and assign it back to the dataframe. If any of the variables with the ending "xyz" has data (i.e. is not NA) then I would like to assign the count of values, otherwise a value of NA.

df %<>% mutate_at(vars(ends_with("xyz")), funs(new_var = ifelse(!is.na(), 1, NA)))

The above code gives an error requiring an argument for !is.na() but the vars argument requires a function. How do I combine this?

Edit: Here is the reproducible example and desired output:

`# A tibble: 6 x 6
       1_abc    1_xyz     2_abc      2_xyz     3_abc   3_xyz
1       NA        1          NA          1        NA      NA
2       NA       NA          NA         NA        NA      NA 
3       NA       NA          NA          1        NA      NA
4       NA       NA          NA         NA        NA      NA
5       NA       NA          NA         NA        NA      NA 
6       NA        1          NA         NA        NA      NA`

The desired output would be a variable such as xyz_num where values would be NA if all _xyz vars are NA or the count of non-null variables if any of the _xyz vars are not NA.

`# A tibble: 6 x 7
       1_abc    1_xyz     2_abc      2_xyz     3_abc   3_xyz   xyz_num
1       NA        1          NA          1        NA      NA         2      
2       NA       NA          NA         NA        NA      NA        NA
3       NA       NA          NA          1        NA      NA         1
4       NA       NA          NA         NA        NA      NA        NA
5       NA       NA          NA         NA        NA      NA        NA
6       NA        1          NA         NA        NA      NA         1`
Aramis7d
  • 2,444
  • 19
  • 25
AstraOK
  • 49
  • 9
  • 2
    `...is.na(.)...` – Sotos Mar 13 '19 at 10:12
  • Oh yes...That was obvious. Thank you so much!!! – AstraOK Mar 13 '19 at 10:45
  • Sorry this actually isn't giving me what I thought. I think it is more something like this: `df %<>% mutate(new_var = ifelse(!is.na(vars(ends_with("xyz"))), 1, NA))` But this is assigning every observation a value of 1. I only want a 1 assigned if any one of the 10 variables ending with xyz contains data instead of an NA....any suggestions? – AstraOK Mar 13 '19 at 11:24
  • For further help, please post a reproducible example and expected output. Thanks – Sotos Mar 13 '19 at 12:33
  • Hi @AstraOK, should the first row for column `xyz_num` be `1` instead of `2`? – Aramis7d Mar 13 '19 at 14:09
  • It would be better if it were 2 (as the count of the variables with non NA data is 2 1_xyz and 2_xyz), but I would settle for 1 :). – AstraOK Mar 13 '19 at 14:13

1 Answers1

3

with dplyr, you can try something like

df1 %>%
  select(ends_with("_xyz")) %>%
  mutate(nnums = rowSums(!is.na(.)))

assuming input is

structure(list(X1_abc = c(NA, NA, NA, NA, NA, NA), X1_xyz = c(1, 
NA, NA, NA, NA, 1), X2_abc = c(NA, NA, NA, NA, NA, NA), X2_xyz = c(1, 
NA, 1, NA, NA, NA), X3_abc = c(NA, NA, NA, NA, NA, NA), X3_xyz = c(NA, 
NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-6L))

it returns

  X1_xyz X2_xyz X3_xyz nnums
1      1      1     NA     2
2     NA     NA     NA     0
3     NA      1     NA     1
4     NA     NA     NA     0
5     NA     NA     NA     0
6      1     NA     NA     1

i hope you can modify around the code to keep the columns you want.

EDIT 1:

to keep all columns, try

df1 %<>%
  mutate(nnums = rowSums(!is.na(select(df1, ends_with("_xyz")))))
Aramis7d
  • 2,444
  • 19
  • 25
  • Thanks so much. The problem is I need to keep all of the other ~200 variables in the data set which is why I'm trying to work with vars and ends_with instead of select. I've tried using filter as well `df1 <- df %>% filter_at(vars(ends_with("_xyz"))) %>% mutate(nnums = rowSums(!is.na(.)))` but this gives an error. Is there a way to do this while keeping the data together? – AstraOK Mar 13 '19 at 14:56
  • 1
    added an edit to address your need. also, `filter` is used to operate on rows, so that won't really help much in this situation. – Aramis7d Mar 13 '19 at 14:59
  • That's it. Thanks so much for the help! – AstraOK Mar 13 '19 at 17:08
  • It will work indeed, but rowSums has the drawback of first converting the data frame to a matrix. Can be problematic for big datasets. An alternative is to build a quoted expression with the formula (play with rlang::syms, rlang::call2, quo and '!!') and then evaluate it with `df1 %>% mutate(nnums = !! the_quote)`. More robust but also trickier... – Pierre Gramme Mar 13 '19 at 17:20