2

I have a data frame with columns labeled sales1, sales2, price1, price2 and I want to calculate revenues by multiplying sales1 * price1 and so-on across each number in an iterative fashion.

data <- data_frame(
  "sales1" = c(1, 2, 3),
  "sales2" = c(2, 3, 4),
  "price1" = c(3, 2, 2),
  "price2" = c(3, 3, 5))

data
# A tibble: 3 x 4
#  sales1 sales2 price1 price2
#   <dbl>  <dbl>  <dbl>  <dbl>
#1      1      2      3      3
#2      2      3      2      3
#3      3      4      2      5

Why doesn't the following code work?

data %>%
  mutate (
    for (i in seq_along(1:2)) {
      paste0("revenue",i) = paste0("sales",i) * paste0("price",i)
    }
  )
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Gulbas
  • 55
  • 4
  • take a look at [this](https://stackoverflow.com/questions/46832436/dplyr-mutate-formula-based-on-similarities-in-column-names/46836700#46836700) – acylam Nov 02 '17 at 03:46

2 Answers2

3

Assuming your columns are already ordered (sales1, sales2, price1, price2). We can split the dataframe in two parts and then multiply them

data[grep("sales", names(data))] * data[grep("price", names(data))]

#  sales1 sales2
#1      3      6
#2      4      9
#3      6     20

If the columns are not already sorted according to their names, we can sort them by using order and then use above command.

data <- data[order(names(data))]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

This answer is not brief. For that, @RonakShah's existing answer is the one to look at!

My response is intended to address a broader concern regarding the difficulty of trying to do this in the tidyverse. My understanding is this is difficult because the data is not currently in a "tidy" format. Instead, you can create a tidy data frame like so:

library(tidyverse)

tidy_df <- data %>% 
  rownames_to_column() %>%
  gather(key, value, -rowname) %>% 
  extract(key, c("variable", "id"), "([a-z]+)([0-9]+)") %>%
  spread(variable, value)

Which then makes the final calculation straightforward

tidy_df %>% mutate(revenue = sales * price)

#> # A tibble: 6 x 5
#>   rowname    id price sales revenue
#>     <chr> <chr> <dbl> <dbl>   <dbl>
#> 1       1     1     3     1       3
#> 2       1     2     3     2       6
#> 3       2     1     2     2       4
#> 4       2     2     3     3       9
#> 5       3     1     2     3       6
#> 6       3     2     5     4      20

If you need to get the data back into the original format you can although this feels clunky to me (I'm sure this can be improved in someway).

tidy_df %>% mutate(revenue = sales * price) %>%
  gather(key, value, -c(rowname, id)) %>%
  unite(key, key, id, sep = "") %>%
  spread(key, value) %>% 
  select(starts_with("price"), 
         starts_with("sales"),
         starts_with("revenue"))

#> # A tibble: 3 x 6
#>   price1 price2 sales1 sales2 revenue1 revenue2
#> *  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
#> 1      3      3      1      2        3        6
#> 2      2      3      2      3        4        9
#> 3      2      5      3      4        6       20
markdly
  • 4,394
  • 2
  • 19
  • 27
  • I see the problem in not having the data structured as you have described. See below another example df for context `record <- c(seq_along(1:10)) store <- c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5) week <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2) sales_1 <- c(3, 3, 3, 3, 3, 2, 5, 1, 2, 10) sales_2 <- c(1, 2, 4, 5, 6, 2, 3, 6, 1, 8) price_1 <- runif(10, 2, 6) price_2 <- runif(10, 2, 6) df <- data_frame(record, store, week, sales_1, sales_2, price_1, price_2)` I'm having difficulty condensing the 'sales' & 'price' columns using the straw man code. Can you perhaps walk me through using this example? – Gulbas Nov 02 '17 at 18:26
  • @Gulbas, that sounds like a new question to me :). If you post it as one I'll attempt to provide an answer (and you may well find others solutions which are better). You could always link back to this question in your new question to give other users some context on what sort of answer you are looking for – markdly Nov 02 '17 at 22:49