3

I have a dataframe:

dat <- data.frame(X1 = c(0, NA, NA),
                  X2 = c(1, NA, NA),
                  X3 = c(1, NA, NA),
                  Y1 = c(1, NA, NA),
                  Y2 = c(NA, NA, NA),
                  Y3 = c(0, NA, NA))

I want to create a composite score for X and Y variables. This is what I have so far:

clean_dat <- dat %>% rowwise() %>% mutate(X = sum(c(X1, X2, X3), na.rm = T),
                                          Y = sum(c(Y1, Y2, Y3), na.rm = T))

However, I want the composite score for the rows with all NAs (i.e. rows 2 and 3) to be 0 in the column X and Y. Does anyone know how to do this?

Edit: I'd like to know how I can make X and Y in rows 2 and 3 NA too.

Thanks so much!

jo_
  • 677
  • 2
  • 11

1 Answers1

2

By default, sum or rowSums return 0 when we use na.rm = TRUE and when all the elements are NA. To prevent this either use an if/else or case_when approach i.e. determine whether there are any non-NA elements with if_any, then take the rowSums of the concerned columns within case_when (by default the TRUE will return NA)

library(dplyr)
dat %>% 
  mutate(X = case_when(if_any(starts_with('X'), complete.cases) 
    ~ rowSums(across(starts_with('X')), na.rm = TRUE)),
   Y = case_when(if_any(starts_with('Y'), complete.cases) ~ 
    rowSums(across(starts_with('Y')), na.rm = TRUE)) )

-output

  X1 X2 X3 Y1 Y2 Y3  X  Y
1  0  1  1  1 NA  0  2  1
2 NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! I'm hoping the value for Y in row 1 will be 1 instead of 0 (i.e. ignoring the NA in Y2). Everything else is correct. Is that possible? – jo_ May 16 '22 at 16:58
  • 1
    @jo_ sorry, there was a typo. I closed the bracket early – akrun May 16 '22 at 17:00
  • thank you! And how do I make X and Y `NA` in rows 2 and 3? – jo_ May 16 '22 at 17:06
  • @jo_ is this a new change in your post – akrun May 16 '22 at 17:08
  • yes! I am curious how it'll look like if it was `NA` instead of `0` for `Y` in rows 2 and 3. Sorry if that was confusing. I can make a separate post if that's better. – jo_ May 16 '22 at 17:10
  • I applied your solution to a bigger dataset with more columns. An error that comes up is that '`X` must be a vector, not a `formula` object'. I checked that I copied your solution correctly. Do you know what might have gone wrong? I can try simulating the bigger dataset if that's helpful. – jo_ May 16 '22 at 17:51
  • 1
    @jo_ I assume that you have some other columns with names that starts with 'X' and is not numeric. In that case you can specify the column names as such i.e. `if_any(c(X1, X2, X3), complete.cases)` and `across(c(X1, X2, X3))` Similarly for `Y` columns – akrun May 16 '22 at 17:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/244783/discussion-between-jo-and-akrun). – jo_ May 16 '22 at 18:04