1

Say I have a dataframe of tens of columns, and my custom function needs each one of these columns plus a number in a vector to give me the desired output. After being done with all that, I need to generate new column names based on the original column names in the dataframe. How to accomplish this using the tidyverse, instead of for loops or other solutions in base R.

MWE

structure(list(col1 = c(36.0520583373645, 37.9423749063706, 33.6806634587719, 
34.031649012457, 29.5448679963449, NA, 34.7576769718877, 30.484217745574, 
32.9849083643022, 27.4081694831058, 35.8624919654559, 35.0284347997991, 
NA, 32.112605893241, 27.819354948082, 35.6499532124921, 35.0265642403216, 
32.4006569441297, 30.3698557864842, 31.8229364456928, 34.3715903109276
), col2 = c(32.9691195198199, 35.6643664156284, 33.8748732989736, 
34.5436311813644, 33.2228201914256, 38.7621696867191, 34.8399804318992, 
32.9063078995457, 35.7391166214367, 32.7217251282669, 36.3039268989853, 
35.9607654868559, 33.1385915196435, 34.7987649028199, 33.7100463668523, 
34.7773403671057, 35.8592997980752, 33.8537127786535, 31.9106243803505, 
39.3099469314882, 35.1849826815196), col3 = c(33.272278716963, 
NA, 31.8594920410129, 33.1695042551974, 29.3800694974438, 35.1504378875245, 
34.0771487001433, 29.0162879030415, 30.6960024888799, 29.5542117965184, 
34.3726321365982, 36.0602274148362, 33.1207772548047, 31.5506876209822, 
28.8649303491974, 33.4598790144265, 30.5573454464747, 31.6026723913051, 
30.4716061556625, 33.009463000301, 30.846230953425)), row.names = c(NA, 
-21L), class = "data.frame")

save above in a file, and then use example <- dget(file.choose()) to read the above dataframe.

Code

y <- c (2, 1, 1.5)

customfun <- function(x, y){
  n  <- log (x) * y
  print (n)
}

df <- example %>%
  dplyr::mutate(col1.log = customfun (col1, y = y[1])) %>%
  dplyr::mutate(col2.log = customfun (col2, y = y[2])) %>%
  dplyr::mutate(col3.log = customfun (col3, y = y[3]))

Question

Imagine I have tens of these columns not only 3 as in the MWE, how to generate the new ones dynamically using the tidyverse?

doctorate
  • 1,381
  • 1
  • 19
  • 43

2 Answers2

1

tidyverse is not great for these sweep()-like operations, however, one option could be:

example %>%
 do(., sweep(., 2, FUN = customfun, y)) %>%
 rename_all(~ paste(., "log", sep = "."))

   col1.log col2.log col3.log
1  7.169928 3.495571 5.257087
2  7.272137 3.574152       NA
3  7.033848 3.522674 5.192003
4  7.054582 3.542223 5.252446
5  6.771820 3.503237 5.070475
6        NA 3.657445 5.339456
7  7.096801 3.550766 5.292941
8  6.834418 3.493664 5.051786
9  6.992100 3.576246 5.136199
10 6.621682 3.488039 5.079339
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • is it possible to preserve the original columns? – doctorate Dec 21 '19 at 10:26
  • I would say that it is easier to just do a column bind as `example %>% do(., sweep(., 2, FUN = customfun, y)) %>% rename_all(~ paste(., "log", sep = ".")) %>% bind_cols(example)`. – tmfmnk Dec 21 '19 at 10:29
  • fine, and my be someone would come up with another answers! – doctorate Dec 21 '19 at 11:59
  • @tmfmnk could you help with this https://stackoverflow.com/questions/59431981/how-to-create-two-independent-drill-down-plot-using-highcharter – John Smith Dec 21 '19 at 16:16
1

We can use map2 and bind_cols to add new columns

library(dplyr)
library(purrr)

bind_cols(example, map2_df(example, y, customfun) %>%
                           rename_all(~paste0(., ".log"))) 

#       col1     col2     col3 col1.log col2.log col3.log
#1  36.05206 32.96912 33.27228 7.169928 3.495571 5.257087
#2  37.94237 35.66437       NA 7.272137 3.574152       NA
#3  33.68066 33.87487 31.85949 7.033848 3.522674 5.192003
#4  34.03165 34.54363 33.16950 7.054582 3.542223 5.252446
#...
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • nice verse written in the purr forest, I wonder how `map2_df` is any different from the other neighbor `map2_dfc` which seems to give similar output? – doctorate Dec 21 '19 at 18:51
  • @doctorate yes, in this case you could use any of `map2_df`/`map2_dfc`/`map2_dfr` and it will give the same output. It differs in output when you operate it on list. – Ronak Shah Dec 22 '19 at 00:53