0

I have written two functions:

count_na_row <- function(., vars = NULL){
  
  if(is.null(vars)){
    rowSums(is.na(.))
  } else {
    rowSums(is.na(select(., vars)))
  }
  
}

count_na_row_where <- function(., .fun = NULL){
  
  if(is.null(.fun)){
    warning('.fun is NULL. No function is applied. Counts NAs on all columns')
    rowSums(is.na(.))
  } else {
    rowSums(is.na(select(., where(.fun))))
  }
}

The functions are applied as follows:

library(tidyverse)

df <- 
  tibble(
    var1 = c(1, 2, NA, 1, 3, NA, 1, 7),
    var2 = c(NA, 3, 1, 3, NA, 9, NA, 4),
    varstr = c("A", "B", "", "", 'F', 'C', 'A', "M")
  )

df %>% 
  mutate(
    na_count = count_na_row(.),
    na_count_str = count_na_row_where(., is.character)
  )

My trouble is that the functions does not take into account NA values that are recoded inside the same mutate statement:

df %>% 
  mutate(
    varstr = ifelse(varstr == "", NA, varstr),
    na_count = count_na_row(.),
    na_count_str = count_na_row_where(., is.character),
    na_count_num = count_na_row_where(., is.numeric)
  )

But if I recode in a separate, preceeding mutate statement, it works:

df %>%
  mutate(
    varstr = ifelse(varstr == "", NA, varstr)
  ) %>% 
  mutate(
    na_count = count_na_row(.),
    na_count_str = count_na_row_where(., is.character),
    na_count_num = count_na_row_where(., is.numeric)
  )

How can I adapt the functions so that I can recode into NA values inside the same mutate statement? I suspect the issues lies with rowSums.

oskjerv
  • 198
  • 7
  • Maybe this could help? https://stackoverflow.com/questions/33672059/rowsums-with-all-na –  Dec 03 '21 at 10:34

1 Answers1

1

This is working as intended. It doesn't have to do with rowSums as much as it has to do with the . operator.

From the magittr documentation we can find:

Placing lhs elsewhere in rhs call

Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example, y %>% f(x, .) is equivalent to f(x, y) and z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z).

What is important here is that . refers to the LHS. When you structure your pipeline like this:

df %>%
  mutate(x = ...,
         y = rowSums(.))

The LHS will still be df, as that is the code before the last %>%. If you want to take into account the mutated x you will have to put that in the LHS using nested mutations like so:

df %>%
  mutate(x = ...) %>%
  mutate(y = rowSums(.))
koolmees
  • 2,725
  • 9
  • 23