8

How do one replace the values of a subset in R with Tidyverse?

Using the cars data as an example, if I would like to change all the speed lower than 30 into 0, I can use the command below:

 cars[cars["speed"] < 30,] <- 0

With Tidyverse, one can generate the same subset with more readable commands:

 cars %>% filter(speed < 30) %>% mutate(speed =0)

However, this is changing the subset of data we have taken out from cars, not the values of observations within cars.

I might have missed something obvious but is there an intuitive way to do the same thing with Tidyverse as well? While cars[cars["speed"] < 30,] <- 0 works fine in most cases, it becomes really unwieldy when one has more than 5 conditions to meet.

Carl H
  • 1,036
  • 2
  • 15
  • 27
  • i think this may be a case where the you shld not try to make everything "tidy". `replace` is just `x[list] <- values` which is pretty much what you have done and it's — IMO — just as readable and one less abstraction. – hrbrmstr Apr 11 '17 at 01:42

1 Answers1

12

You could use the replace function:

cars %>% mutate(speed = replace(speed, speed < 30, 0))

An ifelse condition would also work:

cars %>% mutate(speed = ifelse(speed < 30, 0, speed))

I tested this on a one-million-row data frame and replace ran in about one-eighth the time of ifelse.

library(microbenchmark)

set.seed(2)
dat = data.frame(x=runif(1e6, 0, 1000), y=runif(1e6, 0, 1000))

microbenchmark(
  replace=dat %>% mutate(x=replace(x, x<200, 0)),
  ifelse=dat %>% mutate(x=ifelse(x<200, 0, x)),
  if_else=dat %>% mutate(x=if_else(x<200, 0, x)),
  times=100
)
Unit: milliseconds
    expr       min       lq      mean   median        uq      max neval cld
 replace  8.352943  9.55682  18.16755 11.45507  15.33215 224.8759   100 a  
  ifelse 71.782371 87.37754 165.95928 95.12722 262.73016 287.3633   100   c
 if_else 39.947845 47.83934  88.72291 51.99449  59.76760 251.0381   100  b
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Really compelling comparison. Wondering how this would look if you were to use the dplyr command `if_else`. It is supposed to be much faster than the base `ifelse`... – nate-m Mar 16 '18 at 20:02
  • 1
    See updated timings. `if_else` is faster than `ifelse` but slower than `replace`. – eipi10 Mar 16 '18 at 20:20
  • 1
    it would be cool if there were some kind of `%<>%` analogue for `mutate`, e.g. `cars %>% mutate(speed %=% replace(., . < 30, 0))` (I know there are plenty of reasons this would be difficult to implement, but still ... – Ben Bolker Mar 16 '18 at 20:33