2

I have a dataframe with variables V1, V2, e1, e2 and I want to add up V1 and e1, and V2 and e2. It should work for numbers 1 to n, of which n is an argument of a function in which this code is embedded in.

The following code is what I have now, and it works. But it creates all possible sums, such as V1 +e2 which I don't want.

n <- seq_along(1:2)

df <- data.frame(V1=runif(5), V2=runif(5, min = 3,max = 5), e1=100, e2=10)

df %>%
  mutate(across(.cols =  n, .fns = ~ across(starts_with("V")) + across(starts_with("e")) , .names ="{'U'}_{n}")) 

Another way that works is this

map_dfc(.x = seq_along(cols),
        .f = function(ix){
          df %>%
            mutate(!!paste0("U_", ix, ".V", ix) := .data[[paste0("V", ix)]] + .data[[paste0("e", ix)]]) %>%
            select(paste0("U_", ix, ".V", ix))
    }) %>%
  bind_cols(df, .)

but I do not like it because I want to avoid paste0 and I do not want to iterate over mutate, rather in mutate.

Thanks a lot for any help.

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • 2
    Please don't post code like `rm(list = ls())` in your questions unless it is absolutely necessary. No one want to copy your code and accidentally run that, losing whatever they were working on. – Gregor Thomas Feb 22 '23 at 15:34

3 Answers3

3

Your code creates sums of all combinations because you use a nested across. Just move the inner across out and sum them up:

df %>%
  mutate(across(starts_with("V"), .names = "{.col}_e") + across(starts_with("e")))

#   V1 V2  e1 e2 V1_e V2_e
# 1  3  3 100 10  103   13
# 2  2  1 100 10  102   11
# 3  5  2 100 10  105   12
# 4  4  5 100 10  104   15
# 5  1  4 100 10  101   14
Data
set.seed(123)
df <- data.frame(V1 = sample(5), V2 = sample(5), e1 = 100, e2 = 10)
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
3

With across2

library(dplyover)
df %>%
   mutate(across2(starts_with("V"), starts_with("e"), ~ .x + .y))

-output

 V1 V2  e1 e2 V1_e1 V2_e2
1  3  3 100 10   103    13
2  2  1 100 10   102    11
3  5  2 100 10   105    12
4  4  5 100 10   104    15
5  1  4 100 10   101    14
akrun
  • 874,273
  • 37
  • 540
  • 662
3

While the approaches above are great and yield the desired result, they have one drawback: they both assume that the columns are in the correct order.

When we are dealing with a large dataset and where we might not be sure if the order of columns is correct we can use dplyover::over() and construct the column names ourselves using .() (disclaimer: I'm the maintainer of 'dplyover').

Below we use cut_names("^V") to get all numbers after column names that start with "V". Then we use .() to evaluate the string "V{.x}" in which {.x} evaluates to the number we are looping over.

This approach is safe even if the columns are in an arbitrary order.

library(dplyr)
library(dplyover) # https://timteafan.github.io/dplyover/

df <- data.frame(V1=runif(5),
                 V2=runif(5, min = 3,max = 5),
                 e1=100,
                 e2=10)

df %>% 
  mutate(over(cut_names("^V"), # yields `c("1", "2")`
                ~ .("V{.x}") + .("e{.x}"),
              .names = "Ve_{x}"  
              )
  )
#>          V1       V2  e1 e2     Ve_1     Ve_2
#> 1 0.5283220 4.027499 100 10 100.5283 14.02750
#> 2 0.2488303 3.781645 100 10 100.2488 13.78165
#> 3 0.3434550 4.810805 100 10 100.3435 14.81081
#> 4 0.1868810 3.671601 100 10 100.1869 13.67160
#> 5 0.6652419 3.715733 100 10 100.6652 13.71573

Created on 2023-02-23 by the reprex package (v2.0.1)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39