-1

I am trying to build a pivot table using R for my excel dataset. I need to group the numbers(which are in a column called, weights ranging from 70-100. There is a price associated with each of these weights. I need to find the mean(weights), max(weights), min(weights) and the count of products in each weight category. There are about 3000 obs. of 25 variables. Weights and Price are two of them. Snippet of the data:

Weight   Price   Order No.   Date_Ordered   Invoiced_Date   Region  
85       $2300 
78       $5600
100      $3490
95       $2450
90       $5890

I am looking for something like:
    Weight                       Count    Mean(Price)   Min(Price)   Max(Price)
70-75(including 75)     
75-80
80-85
85-90
90-95
95-100

I am able to get the count but I am not able to get the mean, min and max for each weight category:

#Import the dataset
dataset = read.xlsx('Product_Data.xlsx')
gdataset <- group_by(dataset, Weight)
attach(gdataset)
periods <- seq(from = 70, to = 100, by 5)
snip < -cut(Weight, breaks = periods, right = TRUE, include.lowest = TRUE)
report <- cbind(table(snip))
victor8910
  • 193
  • 1
  • 8
  • Hi there, welcome to Stack Overflow. To help others answer your question, please consider editing it to add an minimum reproducible example. See http://stackoverflow.com/help/mcve – HAVB Jan 22 '18 at 22:32

1 Answers1

2

Your data is a bit sparse, so I'll create my own for this answer. I'll ignore the other columns, though presence in the data should not impact anything.

set.seed(2)
n <- 100
dat <- data.frame(
  Weight = sample(100, size=n, replace=TRUE),
  Price = sample(9999, size=n, replace=TRUE)
)
head(dat)
#   Weight Price
# 1     19  2010
# 2     71  4276
# 3     58  9806
# 4     17  8289
# 5     95  2870
# 6     95  5959

Base R

The first thing to realize is that you need to group the data into bins. In R, this is easily done with cut.

bins <- seq(0, 100, by=5)
dat$WeightBin <- cut(dat$Weight, breaks = bins)
head(dat)
#   Weight Price WeightBin
# 1     19  2010   (15,20]
# 2     71  4276   (70,75]
# 3     58  9806   (55,60]
# 4     17  8289   (15,20]
# 5     95  2870   (90,95]
# 6     95  5959   (90,95]

Now we just break it up into groups and run a simple summarization function on each group, wraping it back into a data.frame:

do.call(rbind, by(dat$Price, dat$WeightBin, function(x) {
  setNames(
    sapply(c(length, mean, min, max), function(f) f(x)),
    c("Count", "Mean(Price)", "Min(Price)", "Max(Price)")
  )
}))
#          Count Mean(Price) Min(Price) Max(Price)
# (0,5]        5    3919.000       1822       9536
# (5,10]       3    4287.000       1782       5690
# (10,15]      5    5402.200       2739       8989
# (15,20]     11    5192.545       1183       9192
# (20,25]      3    2868.667        137       7363
# (25,30]      6    6594.500       2855       9657
# (30,35]      5    2960.200        777       7486
# (35,40]      6    4937.000        850       9749
# (40,45]      7    5986.000       1307       9527
# (45,50]      4    5957.750       1475       9754
# (50,55]      3    3077.333       1287       4786
# (55,60]      4    4285.500        247       9806
# (60,65]      3    2633.000        450       6656
# (65,70]      4    4244.250        369       9038
# (70,75]      3    2616.333        652       4276
# (75,80]      5    7183.800       3734       8537
# (80,85]      6    4273.667        229       9788
# (85,90]      6    6659.000       1388       9637
# (90,95]      4    4301.750       2870       5959
# (95,100]     7    3967.857        872       8727

dplyr

I'm inferring from the presence of group_by that you are intending to use dplyr. Here's an alternative to get similar results (starting with my original data):

library(dplyr)
dat %>%
  group_by(Bin = cut(Weight, seq(0, 100, by=5))) %>%
  summarize(
    Count = n(),
    Mean = mean(Price),
    Min = min(Price),
    Max = max(Price)
  )
r2evans
  • 141,215
  • 6
  • 77
  • 149