lapply() to use a function over multiple columns of a dataframe

Question

I am tracking the body weights of individuals over time, and the function below allow me to calculate the % body weight of the individual on a particular day, relative to the initial value (essentially dividing the body weight on a particular day by the body weight observed on day 1).

variability <- function(df, column_number) {
  variable_name <- paste0("var_BW", column_number)

   df %>% 
  mutate(!!variable_name := round(100*(df[,column_number]/df[1,column_number]), 1))

}

This function works fine if I use it on one column, but since I have a number of individuals, I would like to use the apply() family to use the function on multiple columns of one dataframe (for instance on columns 1:8 of the dataframe below):

 BW1  BW2  BW3  BW4  BW5  BW6  BW7  BW8
1 18.4 19.6 20.7 17.4 18.7 18.9 19.0 17.8
2 18.1 19.3 20.0 17.5 18.3 19.4 19.5 18.0
3 17.7 18.9 20.4 17.3 18.3 19.2 19.3 17.9

My initial guess is to store the column numbers in a list, and then pass that list as an argument in the lapply() function, as such:

l <- list(1:8)
lapply(working_df, variability, l)

However, when I do that, I get the following error:

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "c('double', 'numeric')"

Any thoughts?

I think you are interested in `apply()` function. Check `apply(working_df, 2, variability) ` — Carles, Dec 29 '18 at 00:11
`sweep` function might be the easiest here - I feel like its often overlooked. ```100*sweep(BW,2,unlist(BW[1,]),`/`)``` — smarchese, Dec 29 '18 at 00:16
To answer your question about the `mutate()` error, that's because you're calling mutate on an object that is not a data frame. — thus__, Dec 29 '18 at 00:23
It would be worthwhile posting the output of your original function, or stating explicitly what it does, since I can see that some of the responses have assumed that you are indexing row-wise rather than column-wise. I'm assuming that they weren't in front of a computer to run the code to check. — g_t_m, Dec 29 '18 at 05:45
@g_t_m you're right, I was indexing column-wise rather than row-wise. The output that both you and AkselA posted below is what I was aiming for. Thanks — ThomasC, Dec 30 '18 at 00:58

AkselA · Answer 1 · 2018-12-29T01:19:53.357

Does this fit?
As it's possible to vectorize the relative percentage part we can simplify things greatly.

bw <- read.table(text="
 BW1  BW2  BW3  BW4  BW5  BW6  BW7  BW8
1 18.4 19.6 20.7 17.4 18.7 18.9 19.0 17.8
2 18.1 19.3 20.0 17.5 18.3 19.4 19.5 18.0
3 17.7 18.9 20.4 17.3 18.3 19.2 19.3 17.9", header=TRUE)

apply(bw, 2, function(x) round(100*x/x[1], 1))
#     BW1   BW2   BW3   BW4   BW5   BW6   BW7   BW8
# 1 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
# 2  98.4  98.5  96.6 100.6  97.9 102.6 102.6 101.1
# 3  96.2  96.4  98.6  99.4  97.9 101.6 101.6 100.6

Or using sweep()

round(sweep(bw, 2, unlist(bw[1,]), "/")*100, 1)
#     BW1   BW2   BW3   BW4   BW5   BW6   BW7   BW8
# 1 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
# 2  98.4  98.5  96.6 100.6  97.9 102.6 102.6 101.1
# 3  96.2  96.4  98.6  99.4  97.9 101.6 101.6 100.6

Or even simpler

round(100 * t(t(bw) / as.matrix(bw)[1,]), 1)
#     BW1   BW2   BW3   BW4   BW5   BW6   BW7   BW8
# 1 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
# 2  98.4  98.5  96.6 100.6  97.9 102.6 102.6 101.1
# 3  96.2  96.4  98.6  99.4  97.9 101.6 101.6 100.6

score 0 · Answer 2 · answered Dec 29 '18 at 00:21

You don't really need apply in this case.

pctvals <- round(100.0 * bw[,1:ncol(bw)] / bw[,1], 2)

yields

  BW1    BW2    BW3   BW4    BW5    BW6    BW7    BW8
1 100 106.52 112.50 94.57 101.63 102.72 103.26  96.74
2 100 106.63 110.50 96.69 101.10 107.18 107.73  99.45
3 100 106.78 115.25 97.74 103.39 108.47 109.04 101.13

score 0 · Accepted Answer · answered Dec 29 '18 at 05:32

There's a super simple option in using mutate_at from the dplyr package:

library(dplyr)

working_df <-
  data.frame(BW1 = c(18.4, 18.1, 17.7),
             BW2 = c(19.6, 19.3, 18.9),
             BW3 = c(20.7, 20.0, 20.4))

variability_v2 <- function(df, column_numbers) {

  df %>% 
    mutate_at(vars(column_numbers), funs(var = round(100*(./first(.)), 1)))

}

variability_v2(working_df, 1:3)
#>    BW1  BW2  BW3 BW1_var BW2_var BW3_var
#> 1 18.4 19.6 20.7   100.0   100.0   100.0
#> 2 18.1 19.3 20.0    98.4    98.5    96.6
#> 3 17.7 18.9 20.4    96.2    96.4    98.6

The only 2 (very minor issues, in my opinion) with this method are:

If you only feed a single column number into the function, then the new column is simply called "var"
The "var" is appended after the column name, not before it

The former could be dealt with by a simple "if" statement within the function, carving out the situation where there is only one column specified. Hopefully you just don't care about the latter!

Thanks, that's exactly what I wanted. Will need to read up on mutate_at() though! — ThomasC, Dec 30 '18 at 00:53

lapply() to use a function over multiple columns of a dataframe

3 Answers3