I have data with different types of variables. Some are character, some factors, and some numeric, like below:
df <- data.frame(a = c("tt", "ss", "ss", NA), b=c(2,3,NA,1), c=c(1,2,NA, NA), d=c("tt", "ss", "ss", NA))
I'm trying to count the number of missing values per observation using c_across
in dplyr
However, c_across
doesn't seem to be able to combine different type of values, as the error message below suggests
df %>%
rowwise() %>%
summarise(NAs = sum(is.na(c_across())))
Error: Problem with
summarise()
inputNAs
. x Can't combinea
<factor> andb
. ℹ InputNAs
issum(is.na(c_across()))
. ℹ The error occurred in row 1.
Indeed, if I include only numeric variables, it works.
df %>%
rowwise() %>%
summarise(NAs = sum(is.na(c_across(b:c))))
Same thing if I include only character variables
df %>%
rowwise() %>%
summarise(NAs = sum(is.na(c_across(c(a,d)))))
I could solve the issue without using c_across
like below, but I have lots of variables, so it's not very practical.
df %>%
rowwise() %>%
summarise(NAs = is.na(a)+is.na(b)+is.na(c)+is.na(d))
I could use the traditional apply
approach, like below, but I'd like to solve this using dplyr
.
apply(df, 1, function(x)sum(is.na(x)))
Any suggestions as to how to compute the number of missing values, row-wise, efficiently, and using dplyr
?