18

I would like to use the succinctness of magrittr and dplyr to copy single values between rows in a subset of columns based on the values in other columns. This is a simple example; I want to apply this idea to many columns of a large dataset with multiple conditions within a long pipe of commands.

Take the dataframe df <- data.frame(a = 1:5, b = 6:10, x = 11:15, y = 16:20):

a   b   x   y

1   6   11  16
2   7   12  17
3   8   13  18
4   9   14  19
5   10  15  20

For the row where a = 5, I would like to replace the values of x and y with those in the row where b = 7, to give:

a   b   x   y

1   6   11  16
2   7   12  17
3   8   13  18
4   9   14  19
5   10  12  17

This attempt fails:

foo <- function(x){ifelse(df$a == 5, df[df$b == 7, .(df$x)], x)}
df %<>%  mutate_each(funs(foo), x, y)

The closest I can get is:

bar <- function(x){ifelse(df$a == 5, df[df$b == 7, "x"], x)}
df %<>%  mutate_each(funs(bar), x, y)

but this is incorrect as it replaces both values with the value from x, rather than x and y respectively.

Thanks for the advice.

  • 1
    what's the dooffernece between `%<>%` and `%>%`? – Marcin Dec 08 '15 at 16:20
  • 5
    `x %<>% f` comes from the `magrittr` package and is equivalent to the common pattern `x <- x %>% f`. – asachet Dec 08 '15 at 16:27
  • 2
    `%>%` also comes from `magrittr` package... – David Arenburg Dec 08 '15 at 18:12
  • 1
    @DavidArenburg `%>%` will work with only `dplyr` loaded while, at the moment and with the CRAN version, using `%<>%` necessitates to have `magrittr` loaded. True, `%>%` comes from `magrittr` through `dplyr`, but to the end-user knowing which packages to load is more relevant! – asachet Dec 09 '15 at 12:55
  • 1
    @antoine-sac well, that's just, like, your opinion, man. – David Arenburg Dec 09 '15 at 12:57

3 Answers3

13

You could do it using mutate_each and replace:

df %>% mutate_each(funs(replace(., a==5, nth(., which(b==7)))), x, y)

Output:

  a  b  x  y
1 1  6 11 16
2 2  7 12 17
3 3  8 13 18
4 4  9 14 19
5 5 10 12 17

Or as per @docendodiscimus 's comment it can be shortened further to (and probably [ is also better than which):

df %>% mutate_each(funs(replace(., a==5, .[b==7])), x, y)
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
9

Just to mention the data.table solution would be:

require(data.table)
setDT(df)[a == 5, c("x", "y") := df[b == 7, .SD, .SDcols = c("x", "y")]]

> df
   a  b  x  y
1: 1  6 11 16
2: 2  7 12 17
3: 3  8 13 18
4: 4  9 14 19
5: 5 10 12 17

Alternatively, you could also use:

cols <- c("x", "y")
setDT(df)[a == 5, (cols) := df[b == 7, .SD, .SDcols = cols]]
# or 
cols <- c("x", "y")
setDT(df)[a == 5, (cols) := df[b == 7, cols, with = FALSE]]
Arun
  • 116,683
  • 26
  • 284
  • 387
Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • 2
    `DT <- setDT(df)` makes little to no sense really.. since `df` is also now a data.table and has been updated by reference. – Arun Dec 08 '15 at 16:59
  • @Arun: I totally agree. When i started using `data.table` the _updating by reference_ concept was very strange to me. As the question asks for a `dplyr` solution i thought i make my answer a little bit more understandable. – Rentrop Dec 08 '15 at 17:10
  • I see.. Better to use as.data.table then. Also using `with=FALSE` or `.SD + .SDcols` would help show that it is easily extensible to many cols. – Arun Dec 08 '15 at 17:16
  • @Arun: Have a look at my edit. I do not understand what you mean by using `with=FALSE`. Feel free to edit. – Rentrop Dec 08 '15 at 17:26
  • Nice, thanks! This can also be piped: `df %<>% as.data.table %>% .[a == 5, c("x", "y") := .[b == 7, .SD, .SDcols = c("x", "y")]]` – Patrick Hogan Dec 08 '15 at 18:01
  • 2
    I took the liberty to improve your answer. Hope you don't mind. – Jaap Dec 08 '15 at 18:12
5

If your main requirement is to apply the function within a longer dplyr-pipe, you could do something like the following example:

foo <- function(df, cols = c("x", "y")) {
  df[df$a == 5, cols] <- df[df$b == 7, cols]
  df
}

df %>% ... %>% foo(c("x", "y")) %>% ... 
#  a  b  x  y
#1 1  6 11 16
#2 2  7 12 17
#3 3  8 13 18
#4 4  9 14 19
#5 5 10 12 17
talat
  • 68,970
  • 21
  • 126
  • 157