21

I have the following data frame df:

  v1 v2 v3 v4
1  1  5  7  4
2  2  6 10  3

And I want to obtain the following data frame df2 multiplying columns v1*v3 and v2*v4:

  v1 v2 v3 v4 v1v3 v2v4
1  1  5  7  4    7   20
2  2  6 10  3   20   18

How can I do that using dplyr? Using mutate_each?

I need a solution that can be generalized to a large number of variables and not only 4 (v1 to v4). This is the code to generate the example:

v1 <- c(1, 2)
v2 <- c(5,6)
v3 <- c(7, 10)
v4 <- c(4, 3)
df <- data.frame(v1, v2, v3, v4)
v1v3 <- c(v1 * v3)
v2v4 <- c(v2 * v4)
df2 <- cbind(df, v1v3, v2v4)
sbac
  • 1,897
  • 1
  • 18
  • 31
  • `df %>% mutate(v1v3=v1*v3) %>% mutate(v2v4=v2*v4)` – Amit Kohli Nov 09 '16 at 16:05
  • I edited my question. I need an answer that can be generalised to any number of variables without writing them all. – sbac Nov 09 '16 at 16:11
  • So you want to multiply values in alternate columns? – Ronak Shah Nov 09 '16 at 16:19
  • Yes, but the real case that I have is at this year one with 20 variables, not only 4 as in the example. – sbac Nov 09 '16 at 16:22
  • So for 20 variables you need output as `v1*v3*v5*v7...` or `v1*v3`, `v5*v7`, `v9*v11` etc. ? – Ronak Shah Nov 09 '16 at 16:25
  • No. I need output like v1*v11, v2*v12,...v10*v20. – sbac Nov 09 '16 at 16:26
  • 2
    Well, now: that is a different question :-D You're essentially asking for someone to now code you a loop to generate your dplyr code. You're better off just multiplying your df1 by df2: Which metaphorically speaking, really is an entirely different ball of wax. – leerssej Nov 10 '16 at 18:16
  • Please see my own answer below. – sbac Nov 10 '16 at 18:36
  • 1
    exactly, but that isn't written in tidyverse... ;-) When you ask a question and then change it, it makes all the answers to the first question below it be 'wrong'. You are better off just selecting one of the responses that answered your question, and then asking your new question in a new thread. Your solution is fine, but it is not as good as some of the things you will get when you ask other folks how to solve that particular problem from the get go. Then you will get new cool ways to do things you don't already know - like dot products and dataframe to dataframe multiplications. – leerssej Nov 10 '16 at 19:55

5 Answers5

35

You are really close.

df2 <- 
    df %>% 
    mutate(v1v3 = v1 * v3,
           v2v4 = v2 * v4)

such a beautifully simple language, right?

For more great tricks please see here.

EDIT: Thanks to @Facottons pointer to this answer: https://stackoverflow.com/a/34377242/5088194, here is a tidy approach to resolving this issue. It keeps one from having to write a line to hard code in each new column desired. While it is a bit more verbose than the Base R approach, the logic is at least more immediately transparent/readable. It is also worth noting that there must be at least half as many rows as there are columns for this approach to work.

# prep the product column names (also acting as row numbers)
df <- 
    df %>%
    mutate(prod_grp = paste0("v", row_number(), "v", row_number() + 2)) 

# converting data to tidy format and pairing columns to be multiplied together.
tidy_df <- 
    df %>%
    gather(column, value, -prod_grp) %>% 
    mutate(column = as.numeric(sub("v", "", column)),
           pair = column - 2) %>% 
    mutate(pair = if_else(pair < 1, pair + 2, pair))

# summarize the products for each column
prod_df <- 
    tidy_df %>% 
    group_by(prod_grp, pair) %>% 
    summarize(val = prod(value)) %>% 
    spread(prod_grp, val) %>% 
    mutate(pair = paste0("v", pair, "v", pair + 2)) %>% 
    rename(prod_grp = pair)

# put the original frame and summary frames together
final_df <- 
    df %>% 
    left_join(prod_df) %>% 
    select(-prod_grp)
leerssej
  • 14,260
  • 6
  • 48
  • 57
  • 11
    Now imagine you had 20 variables (`v1` to `v20`). Could you use `mutate`without writing 10 lines of code? – sbac Nov 10 '16 at 12:04
  • 1
    @Facottons - thanks for the poke. I have edited the answer above to include the tidy approach that you suggested. – leerssej Dec 14 '17 at 07:28
4

We can use base R instead of using any extra packages like dplyr or data.table

We can use mapply to vectorize the operation for multiple vectors at the same time

n <- ncol(df)/2
mapply(`*`, df[1:n], df[(n + 1):ncol(df)])

#     v1 v2
#[1,]  7 20
#[2,] 20 18

We can merge (cbind) this dataframe to your original one then.


If you are interested in tidyverse solution the equivalent in purrr would be variants of map2

purrr::map2_df(df[1:n], df[(n + 1):ncol(df)], `*`)

# A tibble: 2 x 2
#     v1    v2
#  <dbl> <dbl>
#1     7    20
#2    20    18
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Yes but I had a special interest in looking at a `dplyr` solution. – sbac Nov 09 '16 at 17:05
  • Is there a specific reason you are looking for a `dplyr` solution ? I am not very much familiar with it. Maybe we can wait, somebody would answer it. – Ronak Shah Nov 09 '16 at 17:40
3

I think I found a solution:

df %>%
  mutate(n = df[1:(ncol(df)/2)] * df[(1+ncol(df)/2):(ncol(df))]) %>% head()

The result is valid for any number of variables. It only remains a problem with the name of the new variables. This is the result:

  v1 v2 v3 v4 n.v1 n.v2
1  1  5  7  4    7   20
2  2  6 10  3   20   18
sbac
  • 1,897
  • 1
  • 18
  • 31
  • I am not sure how this works for you. It returns me an error `Error: Column \`n\` is of unsupported class data.frame` – Ronak Shah May 23 '19 at 02:47
2

Just use mutate as is with a comma to separate new columns mutate(df,"v1v3"=v1*v3,"v2v4"= v2*v4)

Morgan Ball
  • 760
  • 9
  • 23
0

I just found out!!!

In my case, I did:

mutate (log2 (across (starts_with ("ratio"), .names = "log2_{.col}")))

So I would transform to logarithm only the columns whose names started with "ratio". The new columns will have the same name as the originals, but their names will be preceded by the "log2_" prefix.

ginn
  • 87
  • 5