9

My data takes this format:

library(tidyverse)
df <- mtcars
df <- df %>% mutate(vs_doubled = vs * 2) %>% select(mpg, cyl, vs, am, vs_doubled)

head(df)


#>    mpg cyl vs am vs_doubled
#> 1 21.0   6  0  1          0
#> 2 21.0   6  0  1          0
#> 3 22.8   4  1  1          2
#> 4 21.4   6  1  0          2
#> 5 18.7   8  0  0          0
#> 6 18.1   6  1  0          2

I'm trying to use mutate_at and na_if to set 0 values as NA--but only for specific columns ("vs" and "am"). I would like to leave the column "vs_doubled" with zeros in it.

I haven't quite got it right, because the following line doesn't work:

df <- df %>% mutate_at(.vars = c("vs", "am"), .funs = na_if(y = 0))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Jeremy K.
  • 1,710
  • 14
  • 35

3 Answers3

22

Update

From dplyr 1.0.0 we can use across :

library(dplyr)
df %>% mutate(across(c(vs,am), na_if, 0)) %>% head

#   mpg cyl vs am vs_doubled
#1 21.0   6 NA  1          0
#2 21.0   6 NA  1          0
#3 22.8   4  1  1          2
#4 21.4   6  1 NA          2
#5 18.7   8 NA NA          0
#6 18.1   6  1 NA          2

Original answer

In the previous versions of dplyr we can use mutate_at :

df %>%  mutate_at(vars(vs,am), ~na_if(.,0)) %>% head

Or another way would be

df %>% mutate_at(vars(vs,am), na_if, 0)

~ is purrr-styled formula syntax whereas . represents value of the column. It's an alternative to anonymous function calls with which you would have written the above function as

df %>%  mutate_at(vars(vs,am), function(x) na_if(x, 0)) 

Also the alternative way shown does not require ~ and we can directly pass the function with additional arguments (which is 0 here for y).


And of course there are other ways to do this without using na_if

df %>% mutate_at(vars(vs, am), ~replace(., . == 0, NA)) 

Or the same with base R

cols <- c("vs", "am")
df[cols] <- lapply(df[cols], function(x) replace(x, x == 0, NA))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 2
    It would be good to provide a basic explanation of what `~` and `.` do here. I like the `tidyverse` but it's frustrating when the answers to `tidyverse` questions often end up being "add these arbitrary symbols". – Marius Jul 08 '19 at 05:31
  • With `~na_if()`, is the `~` to indicate that the next part is a formula? It won't just take a function? I'm still trying to work out why I should have used `~` before the `na_if`. Thank you. – Jeremy K. Jul 08 '19 at 05:31
  • 2
    For anyone looking for an explanation of the `~` and `.`, I'm finding this link very useful https://suzan.rbind.io/2018/02/dplyr-tutorial-2/#mutate-at-to-change-specific-columns – Jeremy K. Jul 08 '19 at 05:39
2

We can use case_when with mutate_at

library(tidyverse)
df %>% 
     mutate_at(vars(vs, am), ~ case_when(!. ~ NA_real_, TRUE ~ .)) %>%
     head
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I was hoping that the "vs" and "am" columns above would have NA rather than zeros. Thank you. – Jeremy K. Jul 08 '19 at 13:38
  • What does the `!.` refer to here? – Working dollar Mar 05 '23 at 15:39
  • @Workingdollar Here, the `.` refers to the columns values, negate with `!`) so `!.` - will return TRUE for all 0 and FALSE for others, thus 0 will be converted to NA and others remain as such as `TRUE ~ .` – akrun Mar 05 '23 at 18:16
0

I was looking for a way to change certain values (Values <2) in my dataframe to NA. I have data-time in one of the columns however na_if() was definitely not working. I used Ronak's suggestion - my only edit is to convert the tibble to a dataframe.

Suggested solution:

**df <**- df %>% mutate_at(vars(vs, am), ~replace(., . == 0, NA)) 

Excerpt/Application in my script:

wave_T_max <- wave_T_max %>% 
      mutate_at(vars(Value), ~replace(Value, Value <2, NA))
Hadi GhahremanNezhad
  • 2,377
  • 5
  • 29
  • 58
I Bee
  • 1