How to call a function for each row of a data.frame?

Question

I have a function with several paramters. This function returns a data.frame.

I have another data.frame.

Now I would like to call my function for each row of my data.frame (as parameters). The resulting data.frames I would like to rbind.

So I thought something like

do.call(rbind, apply(df, 1, f))

is my friend.

But: During this call df gets converted to a matrix. In this process all numbers are converted to characters. So I have to modify my function to reconvert. That's clumsy and I'm afraid I miss something.

So my question is, how can I do this?

As example see the following code:

Sys.setenv(LANG = "en")
# Create data.frame
df <- data.frame(
  a = c('a', 'b', 'c'),
  b = c(1, 2, 3),
  stringsAsFactors = FALSE
)

# My function 
f <- function(x) {
  data.frame(
    x = rep(paste(rep(x[['a']], x[['b']]), collapse=''),x[['b']]),
    y = 2 * x[['b']],
    stringsAsFactors = FALSE
  )
}

apply(df, 1, f)

Here I get the error:

Error in 2 * x[["b"]] : non-numeric argument to binary operator

So I change function f to function g:

g <- function(x) {
  data.frame(
    x = rep(paste(rep(x[['a']], as.numeric(x[['b']])), collapse=''), as.numeric(x[['b']])),
    y = 2 * as.numeric(x[['b']]),
    stringsAsFactors = FALSE
  )
}

Now I can call

 do.call(rbind, apply(df, 1, g))

and I get

    x y
1   a 2
2  bb 4
3  bb 4
4 ccc 6
5 ccc 6
6 ccc 6

I tried to use a for-loop.

result <- f(df[1,])
for(i in 2:nrow(df)){
  result <- rbind(result, f(df[i,]))
}
result

That does work. But this can't be the R-way. for-loops aren't "R-ish" There's too much what can go wrong. Perhaps df can be empty or does only have one row.

So what's the base-R or dplyr/tidyverse solution?

I suggest reversing the order of your post -- start with what you _want_ to do (sample input & output), then show us what you've tried. I'm a bit confused after a first read-through. — MichaelChirico, Jan 10 '18 at 17:11
Are you aware of `?strrep`? For instance `strrep(df$a,df$b)` is a good starting point. — nicola, Jan 10 '18 at 17:13
note that `apply` almost immediately converts `df` to a `matrix`, so `x[['b']]` is already `character` right away — MichaelChirico, Jan 10 '18 at 17:15
@nicola To repeat a string is not the point. That's just an example function. I'd like to call a function for each row of df and rbind the result. — JerryWho, Jan 10 '18 at 17:16
@MichaelChirico Yes, that's the reason why I'm looking for a better solution. — JerryWho, Jan 10 '18 at 17:17
Building on @nicola's comment: `data.frame(x = rep(strrep(df$a, df$b), df$b), y = rep(df$b * 2, df$b))` — Jaap, Jan 10 '18 at 17:17

score 7 · Answer 1 · answered Jan 10 '18 at 17:18

7

Well, apply() is meant for matrices and doesn't play with with data.frames. It really should be avoided in cases like these. It's better to write functions that take proper parameters rather than require passing data.frame rows.

f <- function(a, b) {
  data.frame(
    x = rep(paste(rep(a, b), collapse=''), b),
    y = 2 * b,
    stringsAsFactors = FALSE
  )
}

Then you can use a more conventional map() style approach (especially easy if using just two columns)

purrr::map2_df(df$a, df$b, f)

With more columns, (and column names that match the parameter names), you can use

purrr::pmap_df(df, f)

answered Jan 10 '18 at 17:18

MrFlick

195,160
17
277
295

That's interesting. Thanks. I'll try to apply this to my real problem. – JerryWho Jan 10 '18 at 17:25
My "real world-function" f has indeed more than two columns. So first I thought I can pass them as "..." parameter. But that doesn't work. So I tried pmap_dfr. But pmap_dfr destroys variables of class Date (https://github.com/tidyverse/purrr/issues/358). – JerryWho Jan 13 '18 at 13:52

MichaelChirico · Answer 2 · 2018-01-10T17:24:11.767

I believe you can do this quite cleanly in data.table:

library(data.table)
setDT(df)
df[ , .(x = rep(paste(rep(a, b), collapse = ''), b), y = 2*b), 
   keyby = seq_len(nrow(df))]
#    seq_len   x y
# 1:       1   a 2
# 2:       2  bb 4
# 3:       2  bb 4
# 4:       3 ccc 6
# 5:       3 ccc 6
# 6:       3 ccc 6

The keyby = seq_len(nrow(df)) part is the clunkiest bit; this in particular is the subject of a few enhancement requests for data.table, e.g., #1063

score 2 · Answer 3 · answered Jan 17 '18 at 22:10

2

tidyverse answer:

> df %>% split(1:nrow(df)) %>% map(f) %>% bind_rows()
    x y
1   a 2
2  bb 4
3  bb 4
4 ccc 6
5 ccc 6
6 ccc 6

You can split the df by rows (which gives you a list of tibbles), then map the function to each row (where the function returns a dataframe), then bind_rows() it all back together.

answered Jan 17 '18 at 22:10

twedl

1,588
1
17
28

Nice thought :) Though a bit strange to call it "tidyverse answer" in gigantic text while the provided solution just contains a (rather slow) `for` loop packed into `map()` and a couple of pipes... – MS Berends Mar 03 '22 at 10:06

MS Berends · Answer 4 · 2022-03-03T10:10:33.773

No real tidyverse answers here yet.

I also think apply() is the most sensible function here, but I wrote a function to make it work in dplyr verbs, with support for the tidyverse selection language such as starts_with() and where(...):

row_function <- function(fn, ..., data = NULL) {
  if (is.null(data)) {
    data <- dplyr::cur_data()
  } else if (!is.data.frame(data)) {
    stop("'data' must be a data.frame", call. = FALSE)
  }
  if (tryCatch(length(list(...)) > 0, error = function(e) TRUE)) {
    data <- dplyr::select(data, ...)
  } 
  apply(data, 1, fn)
}

Demo:

iris %>% 
  mutate(max = row_function(max, where(is.numeric)),
         sepal_mean = row_function(mean, starts_with("Sepal"))) %>% 
  head()

#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species max sepal_mean
#> 1          5.1         3.5          1.4         0.2  setosa 5.1       4.30
#> 2          4.9         3.0          1.4         0.2  setosa 4.9       3.95
#> 3          4.7         3.2          1.3         0.2  setosa 4.7       3.95
#> 4          4.6         3.1          1.5         0.2  setosa 4.6       3.85
#> 5          5.0         3.6          1.4         0.2  setosa 5.0       4.30
#> 6          5.4         3.9          1.7         0.4  setosa 5.4       4.65

The actual tidyverse solution is much less convenient, since it requires rowwise() and c_across(), and transforms the data to a 'rowwised' tibble:

library(dplyr)
iris %>%
  rowwise() %>%
  mutate(sepal_mean = mean(c_across(starts_with("Sepal"))))

#> # A tibble: 150 × 6
#> # Rowwise: 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sepal_mean
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>        <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa        4.3 
#>  2          4.9         3            1.4         0.2 setosa        3.95
#>  3          4.7         3.2          1.3         0.2 setosa        3.95
#>  4          4.6         3.1          1.5         0.2 setosa        3.85
#>  5          5           3.6          1.4         0.2 setosa        4.3 
#>  6          5.4         3.9          1.7         0.4 setosa        4.65

Note that you could just do `across(starts_with("Sepal")) %>% apply(1, mean)` in `mutate()` without the need for `rowwise()` or a wrapper function. Also, I believe the OP is asking about how to _not_ treat the row values as a vector (which causes the problematic type coercion in the question), but instead a 1-row data frame. — Mikko Marttila, Mar 03 '22 at 10:40
Thanks, great addition. Regarding the OPs question, the title is literally "How to call a function for each row of a data.frame?", to which I provided two answers. — MS Berends, Mar 03 '22 at 20:31

score 0 · Answer 5 · answered Mar 03 '22 at 10:21

With dplyr 1.0 in 2020 there have been a couple of key improvements that make it much easier to handle workflows like this in the tidyverse. Key points are across() which lets you select columns inside dplyr verbs into a data frame, summarise() allowing the result to contain an arbitrary number of rows, and automatic unpacking of unnamed data.frame results into separate columns in transforming verbs like mutate() and summarise().

With the original setup:

df <- data.frame(
  a = c("a", "b", "c"),
  b = c(1, 2, 3),
  stringsAsFactors = FALSE
)

f <- function(x) {
  data.frame(
    x = rep(paste(rep(x[["a"]], x[["b"]]), collapse = ""), x[["b"]]),
    y = 2 * x[["b"]],
    stringsAsFactors = FALSE
  )
}

We can now do:

library(dplyr, warn.conflicts = FALSE)

df %>% 
  rowwise() %>% 
  summarise(
    f(across())
  )
#> # A tibble: 6 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 a         2
#> 2 bb        4
#> 3 bb        4
#> 4 ccc       6
#> 5 ccc       6
#> 6 ccc       6

Here rowwise() groups the data by each row, across() selects all columns, creating a 1-row data frame, and the data.frame result of f() is automatically unpacked to create many new columns.

How to call a function for each row of a data.frame?

5 Answers5

tidyverse answer: