0

My data set contains a column for product type and for purchase quantity. I would like to be able to subtract the average purchase quantity for each product type from the actual purchase on each line.

I have a data set that looks roughly like this

library(dplyr)
set.seed(42)
product <- paste("prod - " , sample(c("A", "B", "C", "D"), size = 15, 
                                replace = TRUE))
purch <- sample(5:10, size = 15, replace = TRUE)

fake_data <- tibble(product, purch)

I can do this using a split-apply-combine method as follows:

data_s <- split(fake_data, fake_data$product) #split
data_a <- lapply(data_s, function(m) cbind(m, m$purch - mean(m$purch))) #apply
data_c <- bind_rows(data_a) #combine

This works, but it occurs right in the middle of an otherwise long and well organized chain using %>% and dplyr. Is there a way to do this using dplyr so that I can get what I need without breaking the chain?

Thank you.

Nick Criswell
  • 1,733
  • 2
  • 16
  • 32

1 Answers1

1
library(dplyr)
fake_data %>% group_by(product) %>% 
                 mutate(NewVal = purch - mean(purch)) %>% arrange(product)
Sumedh
  • 4,835
  • 2
  • 17
  • 32