1

Thanks to @Frank with my previous post (more details there), where I was able to use to answer some questions about a dataset on people's drinking patterns in bars:

bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE
Quark’s Bar,Bender,Black Hole,1,TRUE
Quark’s Bar,Bender,water,1,FALSE
Quark’s Bar,Bender,unspecified,1,TRUE
Quark’s Bar,Homer,Black Hole,2,FALSE
Quark’s Bar,Guinan,unspecified,2,TRUE
Quark’s Bar,Guinan,Black Hole,1,TRUE
Quark’s Bar,Krusty,Black Hole,1,FALSE
Quark’s Bar,Krusty,water,2,FALSE
Quark’s Bar,Rebecca,unspecified,1,FALSE
Maz’s Tavern,Krusty,water,1,TRUE
Maz’s Tavern,Rebecca,water,1,FALSE
Maz’s Tavern,Homer,water,1,TRUE
Maz’s Tavern,Bender,water,2,FALSE

Specifically, @Frank suggested this code:

DF %>%
  arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
  summarise(
    Ld   = toString(drink_ordered),
    Ldt  = paste(Ld, toString(times_ordered), sep="_"),
    Ldtl = paste(Ldt, toString(liked_it), sep="_")
  ) %>% 
  group_by(bar_name) %>% 
  summarise_each(funs(n_distinct)) %>%
  mutate_each(funs(. == 1), -person, -bar_name)

Which produces grouped summaries for whether patrons ordered the same drinks in each bar, how many of them, and if they liked them:

#        bar_name person    Ld   Ldt  Ldtl
#           (chr)  (int) (lgl) (lgl) (lgl)
# 1        Cheers      3  TRUE  TRUE FALSE
# 2  Moe’s Tavern      3  TRUE FALSE FALSE
# 3    Mos Eisley      3  TRUE  TRUE  TRUE
# 4   Ten Forward      3 FALSE FALSE FALSE
# 5 The Hip Joint      4 FALSE FALSE FALSE

For this post, however, I've got an additional issue where some people's drink orders were unspecified (in Quark's Bar), and some people ordered water:

  1. For unspecified, I want it to act as a "generic" drink so it would not be counted as a different drink (if other drinks were ordered at that bar). For example, in Quark's Bar I want to result to be TRUE that everyone ordered the same drink. Of course, if in a bar everyone only ordered unspecified, the result would be TRUE, too.

  2. For water, I generally want it to be ignored (e.g. because it's not an alcoholic drink!), so at first I thought I could simply use dplyr's filter() to remove rows of data where the order was water. The complication is that I want the results to be TRUE when the only thing people ordered was water, such as in Maz's Tavern. So I don't think I can simply remove rows with water, I want them to be considered! In other words, I don't want water to count unless it was the only unique thing ever ordered in that bar_name.

Is there a way to conditionally (is that the right term?) deal with "exceptional" items like water or unspecified? I prefer a dplyr (i.e. Hadley-verse) based solution that produces a table like the one @Frank did with the code above which takes those two items into account, though whatever you can think of will be appreciated. Thank you!

hpy
  • 1,989
  • 7
  • 26
  • 56
  • 2
    At `Quark's Bar`, `Bender` and `Guinan` ordered two drinks, one of which is `unspecified`. Are you sure you still want to say that everyone has ordered the same drink? Frank's original logic will say `FALSE` even for `unspecified=Black Hole` since this is a repeat. Also, what if there is more than one `unspecified` order per person? Can it be matched to more than one different types of specified drinks. As you can see, this can get quite complicated. – aichao Oct 08 '16 at 13:22
  • 1
    Also, let's say someone *only* drank water in a bar where not only water was drank (for example let's say `Krusty` only had water in `Quark's bar`). According to your specifications, this row should be ignored. But, if we *remove* this row, this will impact the `person` count (we would completely remove `Krusty` from `Quark's bar`). Is that what you want ? – Steven Beaupré Oct 08 '16 at 14:01
  • Thanks for your comments, this is indeed getting complicated! What if the person count is done separately so we get (1) the number of people who have ordered something at that bar, and summarise (2) in each bar the number of people that ordered each drink. As for `unspecified` (referring to @aichao 's point), in cases of `unspecified=Black Hole` can it be done so that it remains `TRUE` in @Franks's `Ld` column (since everyone ordered the "same" drink), but `FALSE` in the `Ldt` column since people didn't order the same amount of that drink?? – hpy Oct 08 '16 at 15:39

0 Answers0