0

I'm building a shiny app where certain columns in my data frame will need to be mutated using a simple linear transformation, but the total number of these columns can change, as well as the location of these columns in the data frame. However, the column names have a specific naming convention that I believe makes it possible to develop a dynamic solution. I'm simply stuck on how to accomplish this aim.

Here are the core features of my data:

  • In the example code below you will see several columns labeled a#. These are the columns that I will use to mutate additional columns into my data frame.
  • In my Shiny app, these a# columns are created dynamically based on the input file selected by the user and the model applied to the data on the back-end.
  • These columns are always generated with the same name (i.e., a#), where the 'a' is constant and the # varies sequentially based on factors in my model that are not relevant to the current question.

Goal:

  • What I would like to be able to do is dynamically apply the linear transformation in the code below to every potential value that could appear in the df with the label a#. I have a hunch this involves a dplyr solution that watches for strings, but I'm more stuck on how to get the solution to adapt to any a# variable.
  • Preferably, I'd like to use a tidy solution.

Thanks.

Code:

library(tibble)
library(dplyr)


dat <- tibble (
  a1 = rnorm (100, 0, 1),
  b  = rnorm (100, 0, 1),
  a2 = rnorm (100, 0, 1),
  c  = rnorm (100, 0, 1)
)

# single vector working example of the transformation applied to one column (need dynamic version). 

dat <- dat %>%
  mutate(
    a1_T = 10*a1 + 50
  )
bfoste01
  • 337
  • 1
  • 2
  • 14

1 Answers1

1

Try something like this.

x10_50 <- function(x) {
  10 * x + 50
}

df <- 
  dat %>%
  mutate_at(vars(matches("^a.$")), .funs = list(T = ~x10_50(.)))

mutate_at() is usually used to overwrite the items listed in the vars() argument, adding the list(T, ...) to .funs = will add new variables. You can put the select() helper verbs inside the vars() argument (starts_with(), ends_with(), one_of()) or pass a vector of column names as strings. In this case I used matches() because it will accept a regular expression. ^a.$ means the column will need to starts with an "a" followed by one more character and then end. the list(T, ...) will then apply your function and append "_T" to the variables that come back.

#       a1      b     a2      c  a1_T  a2_T
#    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>
#  1.06    0.164 -0.872  1.24   60.6  41.3
# -0.175   0.445  0.330 -2.16   48.2  53.3
#  0.850  -1.67  -0.984 -0.573  58.5  40.2
#  0.0725  0.261  0.681 -1.45   50.7  56.8
#  0.155  -1.16  -0.828 -0.445  51.5  41.7
# -0.818   0.157  0.112  0.715  41.8  51.1

This post has more info: Create new variables with mutate_at while keeping the original ones

yake84
  • 3,004
  • 2
  • 19
  • 35