How to subset multiple columns condition in R?

Question

All,

My dataset looks like following. I am trying to answer below question.

Question:

Based on Drawing paper data ONLY, does the stores sells more units (units.sold column) of one paper subtype(paper.type) than others ?

To answer above question I used tapply function where I was able to filter data for both papers. Now I am not sure how to proceed further to get only Drawing paper data. Any help is appreciated!

My code

tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)

Dataset

             date year     rep     store paper          paper.type  unit.price   units.sold total.sale
9991  12/30/2015 2015     Ran    Dublin watercolor      sheet       0.77          5       3.85
9992  12/30/2015 2015     Ran    Dublin    drawing       pads      10.26          1      10.26
9993  12/30/2015 2015  Arijit  Syracuse watercolor        pad      12.15          2      24.30
9994  12/30/2015 2015  Thomas Davenport    drawing       roll      20.99          1      20.99
9995  12/31/2015 2015   Ruisi    Dublin watercolor      sheet       0.77          7       5.39
9996  12/31/2015 2015   Mohit Davenport    drawing       roll      20.99          1      20.99
9997  12/31/2015 2015    Aman  Portland    drawing       pads      10.26          1      10.26
9998  12/31/2015 2015 Barakat  Portland watercolor      block      19.34          1      19.34
9999  12/31/2015 2015  Yunzhu  Syracuse    drawing    journal      24.94          1      24.94
10000 12/31/2015 2015    Aman  Portland watercolor      block      19.34          1      19.34

Note: I am new to R.Please provide explanation along with your code.

score 3 · Answer 1 · answered Feb 02 '19 at 14:08

use dplyr from tidyverse and its filter function start. You can chain together functions using the %>% pipe operator.

df2 <- df %>% 
  filter(paper == "drawing") %>% 
  group_by(store, paper.type) %>% 
  summarise(units.sold = sum(units.sold))

  store     paper.type units.sold
  <chr>     <chr>           <dbl>
1 Davenport roll                2
2 Dublin    pads                1
3 Portland  pads                1
4 Syracuse  journal             1

Thanks! found dplyr as another way of filtering my dataset! – Data_is_Power Feb 02 '19 at 14:14 — Data_is_Power, Feb 02 '19 at 14:14

score 1 · Accepted Answer · answered Feb 02 '19 at 13:53

1

You could start by taking aggregate of unit.sold column based on store and paper.type

aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)

#      store paper.type units.sold
#1  Syracuse    journal          1
#2    Dublin       pads          1
#3  Portland       pads          1
#4 Davenport       roll          2

Here we filter the data for only "drawing" type of paper. We can compare the number of units.sold for each store and paper.type based on this output.

answered Feb 02 '19 at 13:53

Ronak Shah

377,200
20
156
213

Thank you for responding! How would I go about aggregating stores and paper type based on units sold ? – Data_is_Power Feb 02 '19 at 14:03
@Data_is_Power the code does the same. For each `store` and `paper.type` it `sum` up the unit sold for only "drawing" type of paper. – Ronak Shah Feb 02 '19 at 14:07
Was looking for same solution. – Data_is_Power Feb 02 '19 at 14:15

score 1 · Answer 3 · answered Feb 02 '19 at 15:16

We can use data.table. Convert the 'data.frame' to 'data.table' with setDT, grouped by 'store' 'paper.type', specify the i expression (paper == 'drawing') to subset the rows and summarise the 'units.sold' by getting the sum of it

library(data.table)
setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
#       store paper.type units.sold
#1:    Dublin       pads          1
#2: Davenport       roll          2
#3:  Portland       pads          1
#4:  Syracuse    journal          1

data

df <-  structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015", 
"12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015", 
"12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran", 
"Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu", 
"Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport", 
"Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
), paper = c("watercolor", "drawing", "watercolor", "drawing", 
"watercolor", "drawing", "drawing", "watercolor", "drawing", 
"watercolor"), paper.type = c("sheet", "pads", "pad", "roll", 
"sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77, 
10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34), 
    units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85, 
    10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
    )), class = "data.frame", row.names = c("9991", "9992", "9993", 
"9994", "9995", "9996", "9997", "9998", "9999", "10000"))

How to subset multiple columns condition in R?

3 Answers3

data