1

Suppose I've got sales data for three sales reps selling a variety of products. What makes it difficult is that each rep sells a different combination of products, and not necessarily the same number of them, either:

Bob sells Products A, B, and C

Mike sells Products A, B, C, and D

Sara sells Products A, B, and E

   RepName Product SalesDollarAmt SalesQty
 1     Bob       A             43        3
 2    Mike       A             14        5
 3    Sara       A             53        1
 4     Bob       B            100       35
 5    Mike       B            215       80
 6    Sara       B            310      105
 7     Bob       C              5        8
 8    Mike       C             10        3
 9    Mike       D            105       50
10    Sara       E             25       18

I want to rotate this on the Product, so that the result looks like this:

  RepName Product.1 SalesDollarAmt.1 SalesQty.1 Product.2 SalesDollarAmt.2 SalesQty.2 Product.3 SalesDollarAmt.3 SalesQty.3 Product.4 SalesDollarAmt.4 SalesQty.4
1     Bob         A               43          3         B              100         35         C                5          8      <NA>                0          0
2    Mike         A               14          5         B              215         80         C               10          3         D              105         50
3    Sara         A               53          1         B              310        105         E               25         18      <NA>                0          0

If they all had the same products, I'd have filtered them by Product into separate dataframes and then joined them back together on RepName. I've tried everything I can think of with spread and dcast. Thanks for any help!

Code for sample data frames:

library(tidyverse)

# initial sales data
df <- tribble(
  ~RepName, ~Product, ~SalesDollarAmt, ~SalesQty,
               #-------------------------------
               "Bob", "A", 43, 3,
               "Mike", "A", 14, 5,
               "Sara", "A", 53, 1,
               "Bob", "B", 100, 35,
               "Mike", "B", 215, 80,
               "Sara", "B", 310, 105,
               "Bob", "C", 5, 8,
               "Mike", "C", 10, 3,
               "Mike", "D", 105, 50,
               "Sara", "E", 25, 18
                )

# ideally rotated data
df2 <- tribble(
  ~RepName, ~Product.1, ~SalesDollarAmt.1, ~SalesQty.1, ~Product.2, ~SalesDollarAmt.2, ~SalesQty.2, ~Product.3, ~SalesDollarAmt.3, ~SalesQty.3, ~Product.4, ~SalesDollarAmt.4, ~SalesQty.4,
  #--------------------------------------------------------------
  "Bob", "A", 43, 3, "B", 100, 35, "C", 5, 8, NA, 0, 0, 
  "Mike", "A", 14, 5, "B", 215, 80, "C", 10, 3, "D", 105, 50,
  "Sara", "A", 53, 1, "B", 310, 105, "E", 25, 18, NA, 0, 0 
)
  • is it true that for each `RepName` and `Product` there will only be one row? (e.g. will Bob only have one row for Product A?) – bouncyball Nov 01 '17 at 15:43
  • 1
    Yes, this is correct; this is sales data for one period only so there will be no such duplication. – ThingyBlahBlah3 Nov 01 '17 at 15:46
  • It seems as if you're converting your data into an untidy format from a tidy one. Can I ask what the reason is? Sometimes there's a better solution to the underlying problem. –  Nov 01 '17 at 16:31
  • I need to pass the output off to someone else who generates reports with a creaky old Excel macro, and it works a lot better for him to have it spread out this way; the macro loops through the rows and uses VLOOKUPs to create each rep's report one at a time. – ThingyBlahBlah3 Nov 03 '17 at 17:41

2 Answers2

1

Using a combination of row_number, gather, spread, and unite, we can reshape the data. It's up to you to reorder the columns if you so choose. In the last line, we specify convert = TRUE in the call to spread. This is due to the fact that when we convert the data to long format (using gather), the column values are converted to character. Specifying convert = TRUE in the call to spread (should) get the values back to a useful form.

df %>%
  group_by(RepName) %>%
  mutate(product_count = row_number()) %>% # product "id" within RepName
  gather(variable, value, -RepName, -product_count) %>% # reshape to long
  unite(var_prod, variable, product_count) %>%
  spread(var_prod, value, convert = TRUE) # reshape to wide

  RepName Product_1 Product_2 Product_3 Product_4 SalesDollarAmt_1 SalesDollarAmt_2 SalesDollarAmt_3 SalesDollarAmt_4 SalesQty_1 SalesQty_2 SalesQty_3 SalesQty_4
1     Bob         A         B         C      <NA>               43              100                5             <NA>          3         35          8       <NA>
2    Mike         A         B         C         D               14              215               10              105          5         80          3         50
3    Sara         A         B         E      <NA>               53              310               25             <NA>          1        105         18       <NA>
bouncyball
  • 10,631
  • 19
  • 31
0

The question is tagged with dcast, so I feel obliged to post a solution which uses dcast().

The data.table version of dcast() can reshape multiple value columns simultaneously which is exactly what we need here. In addition, the rowid() function is used to fill the columns individually for each RepName:

library(data.table)
cast(setDT(df), RepName ~ rowid(RepName), value.var = c("Product", "SalesDollarAmt", "SalesQty"))
   RepName Product_1 Product_2 Product_3 Product_4 SalesDollarAmt_1 SalesDollarAmt_2 SalesDollarAmt_3 SalesDollarAmt_4 SalesQty_1 SalesQty_2 SalesQty_3 SalesQty_4
1:     Bob         A         B         C        NA               43              100                5               NA          3         35          8         NA
2:    Mike         A         B         C         D               14              215               10              105          5         80          3         50
3:    Sara         A         B         E        NA               53              310               25               NA          1        105         18         NA

Edit: Improved version with columns in requested order

In a comment, the OP has disclosed that the reshaping is required because the data will be further processed by an Excel macro. Usually, the position of columns is crucial for Excel formulae.

Therefore the variant below is reordering the columns so that all columns which belong to one product are grouped together:

library(data.table)
# value columns
val <- c("Product", "SalesDollarAmt", "SalesQty")
# create vector of column names in the expected order
col_order <- setDT(df)[, .N, by = RepName][, CJ(seq_len(max(N)), val)][, paste(V2, V1, sep = "_")]
dcast(df, RepName ~ rowid(RepName), value.var = val)[
  #re-order columns in place, i.e., without copying
  , setcolorder(.SD, c("RepName", col_order))]
   RepName Product_1 SalesDollarAmt_1 SalesQty_1 Product_2 SalesDollarAmt_2 SalesQty_2 Product_3 SalesDollarAmt_3 SalesQty_3 Product_4 SalesDollarAmt_4 SalesQty_4
1:     Bob         A               43          3         B              100         35         C                5          8        NA               NA         NA
2:    Mike         A               14          5         B              215         80         C               10          3         D              105         50
3:    Sara         A               53          1         B              310        105         E               25         18        NA               NA         NA
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134