1

All,

My dataset looks like following. I am trying to answer below question.

Question:

Based on Drawing paper data ONLY, does the stores sells more units (units.sold column) of one paper subtype(paper.type) than others ?

To answer above question I used tapply function where I was able to filter data for both papers. Now I am not sure how to proceed further to get only Drawing paper data. Any help is appreciated!

My code

tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)

Dataset

             date year     rep     store paper          paper.type  unit.price   units.sold total.sale
9991  12/30/2015 2015     Ran    Dublin watercolor      sheet       0.77          5       3.85
9992  12/30/2015 2015     Ran    Dublin    drawing       pads      10.26          1      10.26
9993  12/30/2015 2015  Arijit  Syracuse watercolor        pad      12.15          2      24.30
9994  12/30/2015 2015  Thomas Davenport    drawing       roll      20.99          1      20.99
9995  12/31/2015 2015   Ruisi    Dublin watercolor      sheet       0.77          7       5.39
9996  12/31/2015 2015   Mohit Davenport    drawing       roll      20.99          1      20.99
9997  12/31/2015 2015    Aman  Portland    drawing       pads      10.26          1      10.26
9998  12/31/2015 2015 Barakat  Portland watercolor      block      19.34          1      19.34
9999  12/31/2015 2015  Yunzhu  Syracuse    drawing    journal      24.94          1      24.94
10000 12/31/2015 2015    Aman  Portland watercolor      block      19.34          1      19.34

Note: I am new to R.Please provide explanation along with your code.

Data_is_Power
  • 765
  • 3
  • 12
  • 30

3 Answers3

3

use dplyr from tidyverse and its filter function start. You can chain together functions using the %>% pipe operator.

df2 <- df %>% 
  filter(paper == "drawing") %>% 
  group_by(store, paper.type) %>% 
  summarise(units.sold = sum(units.sold))

  store     paper.type units.sold
  <chr>     <chr>           <dbl>
1 Davenport roll                2
2 Dublin    pads                1
3 Portland  pads                1
4 Syracuse  journal             1
nycrefugee
  • 1,629
  • 1
  • 10
  • 23
1

You could start by taking aggregate of unit.sold column based on store and paper.type

aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)

#      store paper.type units.sold
#1  Syracuse    journal          1
#2    Dublin       pads          1
#3  Portland       pads          1
#4 Davenport       roll          2

Here we filter the data for only "drawing" type of paper. We can compare the number of units.sold for each store and paper.type based on this output.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use data.table. Convert the 'data.frame' to 'data.table' with setDT, grouped by 'store' 'paper.type', specify the i expression (paper == 'drawing') to subset the rows and summarise the 'units.sold' by getting the sum of it

library(data.table)
setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
#       store paper.type units.sold
#1:    Dublin       pads          1
#2: Davenport       roll          2
#3:  Portland       pads          1
#4:  Syracuse    journal          1

data

df <-  structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015", 
"12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015", 
"12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran", 
"Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu", 
"Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport", 
"Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
), paper = c("watercolor", "drawing", "watercolor", "drawing", 
"watercolor", "drawing", "drawing", "watercolor", "drawing", 
"watercolor"), paper.type = c("sheet", "pads", "pad", "roll", 
"sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77, 
10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34), 
    units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85, 
    10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
    )), class = "data.frame", row.names = c("9991", "9992", "9993", 
"9994", "9995", "9996", "9997", "9998", "9999", "10000"))
akrun
  • 874,273
  • 37
  • 540
  • 662