-1

In pandas I frequently perform row wise operations with a custom function like this:

df = pd.DataFrame({'v1': [1, 2, 3], 'v2': [3, 4, 6], 'v3': [3, 4, 5]})

def f(row):
    return(sum(row[["v1", "v3"]]) if row.v2 == 3 else 7)

df["new_col"] = df.apply(f, 1)

What would the equivalent be in dplyr?

Note that function f can possibly use many variables, not just v1-v3, so I would prefer not to name them all when calling the function.

edit: Example code of what I have currently in R. In this solution I am passing a pronoun object, which I am in doubt whether is appropriate.

d <- tibble(v1 = c(1,2,3), v2 = c(3,4,6), v3 = c(3,4,5))

f <- function(row){
  if (row$v2 == 3) sum(something?) else 7
}

d %>% rowwise() %>% mutate(new_column = f(.data)) %>% ungroup()

edit2: Expected output. (Index column not important)

   v1  v2  v3  new_col
0   1   3   3        4
1   2   4   4        7
2   3   6   5        7

Note: I am not looking for a solution to this specific problem. I am interested in a general way to pass rows to a function in R / dplyr, like apply() would in pandas.

  • 2
    What have you tried? `mutate`, `if_else`, and `case_when` will likely be helpful – zack Dec 21 '18 at 16:06
  • I am trying something like this: `df %>% rowwise() %>% mutate(new_column = f(.data)) %>% ungroup()` But then the function argument is a pronoun object, which Im not sure is optimal. – Jonatan Pallesen Dec 21 '18 at 16:09
  • @JonatanPallesen Try to come up with a code in R (not a pseudo-code) even if it is not working. We can help you from there. Cheers. – M-- Dec 21 '18 at 16:11
  • 1
    Just use `ifelse(d$v2==3,d$v1+d$v3,7)`. No need of `rowwise()` or loops here, since in R basic operations are vectorized. – nicola Dec 21 '18 at 16:21
  • or `ifelse(d$v2 == 3, rowSums(d[!names(d) %in% "v2"]), 7)` if you have too many columns and don't want to name them individually. – Ronak Shah Dec 21 '18 at 16:25

3 Answers3

1

The equivalent dplyr code, passing whole rows as a dataframe to a function, might be:

library(tidyverse)

df <- tibble(v1 = c(1, 2, 3), v2 = c(4, 5, 6), v3 = c(7, 8, 9))

f <- function(row){
  if (row$v2 == 3){
    return(sum(row$v1, row$v3))
  }else{
    return(7)
  }
}

df %>% 
  rowwise() %>% 
  do(row = as_data_frame(.)) %>%
  mutate(new_col = f(row)) %>% 
  unnest()

Out:

# A tibble: 3 x 4
  new_col    v1    v2    v3
    <dbl> <dbl> <dbl> <dbl>
1       4     1     3     3
2       7     2     4     4
3       7     3     6     5
Jack Brookes
  • 3,720
  • 2
  • 11
  • 22
1

If you have a well-contained set of columns that this would apply to, then I suggest your function only be concerned with individual vectors, not single-row frames.

library(dplyr)
d <- tibble(v1 = c(1,2,3), v2 = c(3,4,6), v3 = c(3,4,5))
f <- function(v1, v2, v3) ifelse(v2 == 3, v1 + v3, 7)
d %>% rowwise() %>% mutate(new_column = f(v1, v2, v3)) %>% ungroup()
# # A tibble: 3 x 4
#      v1    v2    v3 new_column
#   <dbl> <dbl> <dbl>      <dbl>
# 1     1     3     3          4
# 2     2     4     4          7
# 3     3     6     5          7

I used ifelse defensively, "in case" it is ever used on groups vice just rows. It works just fine if you define the function as

f <- function(v1, v2, v3) if (v2 == 3) v1+v3 else 7

In fact, if your real-world logic is not more complex, then this does not require rowwise() and would therefore be significantly faster. (But I don't know your real needs.)

Alternative:

d %>% mutate(new_column = purrr::pmap_dbl(list(v1,v2,v3), f))
r2evans
  • 141,215
  • 6
  • 77
  • 149
0
df %>% mutate(new_col=with(.,case_when(v2 != 3 ~ 7,v2 == 3 ~ (v1 + v3))))

ouput

 v1 v2 v3 new_col
1  1  3  3       4
2  2  4  4       7
3  3  6  5       7
e.matt
  • 836
  • 1
  • 5
  • 12