Iterative function to subtract columns from a specific column in a dataframe and have the values appear in a new column

Question

Hi everyone and thank you for reading.

I've been stuck in trying to create a function that will iteratively subtract the values of two columns and paste the value in a new column. To show what I mean here is an example with the starting dataset:

Sample  g1   g2   g3    g4   g5 
s001    5    10   15    20   25
s002    6    11   16    21   26
s003    7    12   17    22   27
s004    8    13   18    23   28

Let's say I would like choose g3. I would then like to subtract all the other columns from g3, but have the values show up in a new column right next to each one. Essentially the end result would look like this:

Sample  g1  g1dt  g2  g2dt  g3  g3dt  g4  g4dt  g5dt  g5dt 
s001    5   10    10   5    15   0    20  -5    25    -10
s002    6   10    11   5    16   0    21  -5    26    -10
s003    7   10    12   5    17   0    22  -5    27    -10
s004    8   10    13   5    18   0    23  -5    28    -10

The code I tried looked like this:

for (i in 2:6) {
dt <- paste0(names(dataset)[i]) #where names(dataset) is the ith name 
#from dataset
dataset[[dt]] <- dataset$g3 - dataset[[,2:6]] #[[]] is 
#supposed to create a new column "dt" added as a suffix
}

This however results in the following error:

Error in .subset2(x, ..2, exact = exact) : 
recursive indexing failed at level 3

Any idea on what I could otherwise try? Please let me know if I need to clear up any confusing matters. Thanks!

acylam · Answer 1 · 2018-08-13T21:34:02.067

We can do this using mutate_at:

library(dplyr)

myfun <- function(DF, col){
  col_quo <- enquo(col)
  DF %>%
    mutate_at(vars(-Sample), funs(dt = !!col_quo - .)) %>%
    select(Sample, sort(current_vars())) %>%
    rename_all(funs(sub("_", "", .)))
}

myfun(df, g3)

Result:

  Sample g1 g1dt g2 g2dt g3 g3dt g4 g4dt g5 g5dt
1   s001  5   10 10    5 15    0 20   -5 25  -10
2   s002  6   10 11    5 16    0 21   -5 26  -10
3   s003  7   10 12    5 17    0 22   -5 27  -10
4   s004  8   10 13    5 18    0 23   -5 28  -10

Notes:

enquo turns the expression supplied as an argument into a quosure. it is later being evaluated using !! in the mutate_at step.
mutate_at applies a function to thes columns specified in vars. If you set the output to a variable like I did dt = g3 - ., new columns are automatically created with _dt as a suffix.
Since OP stated that he wants each output column to be next to the original, we can sort current_vars() and use select to set the correct column order while keeping Sample the first column.
This last rename_all step is optional, but if we do not like _ to be part of the suffix, we can use rename_all and sub to remove all _'s from the column names.

Data:

df <- structure(list(g1 = 5:8, g2 = 10:13, g3 = 15:18, g4 = 20:23, 
    g5 = 25:28), .Names = c("g1", "g2", "g3", "g4", "g5"), class = "data.frame", row.names = c("s001", 
"s002", "s003", "s004"))

DanY · Accepted Answer · 2018-08-13T19:49:15.650

This will do what you want. Notice that myfun treats the first column as special, as per your example.

# example data
df <- data.frame(
    Sample = paste0("s00", 1:4),
    g1 = 5:8,
    g2 = 10:13,
    g3 = 15:18,
    g4 = 20:23,
    g5 = 25:28,
    stringsAsFactors = FALSE
)

# function to do what you want
myfun <- function(x, df) {
    mat <- df[[x]] - as.matrix(df[ , names(df)[-1]]) #subtract all cols from x
    colnames(mat) <- paste0(names(df)[-1], "dt")     #give these new cols names
    df <- cbind(df, mat)                             #add new cols to dataframe
    df <- df[ , c(1, order(names(df)[-1])+1)]        #reorder cols
    return(df)
}

# test it
myfun("g3", df)

# result
  Sample g1 g1dt g2 g2dt g3 g3dt g4 g4dt g5 g5dt
1   s001  5   10 10    5 15    0 20   -5 25  -10
2   s002  6   10 11    5 16    0 21   -5 26  -10
3   s003  7   10 12    5 17    0 22   -5 27  -10
4   s004  8   10 13    5 18    0 23   -5 28  -10

score 1 · Answer 3 · answered Aug 13 '18 at 19:24

1

Here is one possible dplyr solution:

library(dplyr)

# reproduce your data frame
df <- data_frame(
  Sample = c("s001", "s002", "s003", "s004"),
  g1 = 5:8,
  g2 = 10:13,
  g3 = 15:18,
  g4 = 20:23,
  g5 = 25:28
)

# compute the differences and arrange the order of columns
df %>%
  mutate(
    g1dt = g3 - g1,
    g2dt  = g3 - g2,
    g3dt  = g3 - g3,
    g4dt  = g3 - g4,
    g5dt  = g3 - g5,
  ) %>%
  select(1, 2, 7, 3, 8, 4, 9, 5, 10, 6, 11)

# # A tibble: 4 x 11
#   Sample    g1  g1dt    g2  g2dt    g3  g3dt    g4  g4dt    g5  g5dt
#   <chr>  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 s001       5    10    10     5    15     0    20    -5    25   -10
# 2 s002       6    10    11     5    16     0    21    -5    26   -10
# 3 s003       7    10    12     5    17     0    22    -5    27   -10
# 4 s004       8    10    13     5    18     0    23    -5    28   -10

answered Aug 13 '18 at 19:24

OzanStats

2,756
1
13
26

I suggest sorting columns by name, which makes the code more readable and it's less troublesome adding new columns! Also, subtracting columns manually for such (simple) task is not recommendable: a lot of coding and again troublesome recoding for each column added! – tobiaspk1 Aug 13 '18 at 19:42
This could work, but unfortunately my original dataset has 50 columns and that would be a bit cumbersome. I was using a practice dataset on here to keep things simple in the question =) – vanish007 Aug 13 '18 at 19:45
1

@tobiaspk1; I totally agree with your comment. This solution is generic. It is good if the function to apply differs across other columns and names are arbitrary. It is not the most efficient one with this specific task. – OzanStats Aug 13 '18 at 19:49

score 0 · Answer 4 · answered Aug 13 '18 at 19:38

You might simply create a new dataframe and delete values from a specific column.

df_new <- - df[, 2:6] + df[, 4]  # calculate subtractions
colnames(df_new) <- paste0(colnames(df_new), "dt")
df <- cbind(df, df_new)

This solution avoids ineffective loops and is scalable (you can add as many columns as you want).

If the order of the columns should be important to you, just sort after name; that solution complies with your column-naming:

df <- df[, order(colnames(df))]

score 0 · Answer 5 · answered Aug 13 '18 at 20:51

In base R:

fun <- function(df,x) {
  df[paste0(names(df)[-1],"dt")] <- df[["g3"]] - df[-1]
  df
}
fun(df,"g3")
#   Sample g1 g2 g3 g4 g5 g1dt g2dt g3dt g4dt g5dt
# 1   s001  5 10 15 20 25   10    5    0   -5  -10
# 2   s002  6 11 16 21 26   10    5    0   -5  -10
# 3   s003  7 12 17 22 27   10    5    0   -5  -10
# 4   s004  8 13 18 23 28   10    5    0   -5  -10

data

df <- read.table(text="Sample  g1   g2   g3    g4   g5 
s001    5    10   15    20   25
s002    6    11   16    21   26
s003    7    12   17    22   27
s004    8    13   18    23   28",strin=F,h=T)

Iterative function to subtract columns from a specific column in a dataframe and have the values appear in a new column

5 Answers5

Linked