-1

How do I group the below sample of data by Gender and compute means for each of the genre columns within these groups considering only those values greater than 1? Ultimately I want to plot a bar chart with genres on the horizontal independent to each other, each characterized by two bars corresponding to each gender classification.

Action |Adventure |Animation |Children| Comedy | Gender

0 | 5 | 0 | 0 | 0 | M

0 | 0 | 1 | 2 | 3 | M

2 | 3 | 0 | 0 | 4 | F

0 | 0 | 0 | 2 | 0 | M

4 | 0 | 3 | 0 | 2 | F

4 | 4 | 0 | 0 | 0 | F

I'm aware of varipus possible ways of going about this problem but am looking for a compact code that can be executed in ggplot or other plot functions directly rather than having to do pre-processing the data and then using these to plotting. However, any smart approach is welcomed.

bb5kb
  • 51
  • 1
  • 8
  • 1
    Perhaps you can share some of these various ways, and why they don't suit you. Also, please share the data in a form that people can actually get into their sessions (using `dput`, or at least not a ton of separators in all direction). – Axeman Apr 09 '17 at 18:27

1 Answers1

2

You could try:

library(tidyr)
library(dplyr)
library(ggplot2)

df %>%
  gather(key, value, -Gender) %>%
  filter(value > 1) %>%
  group_by(Gender, key) %>%
  summarise(value = mean(value)) %>%
  ggplot(aes(key, value)) + 
  geom_bar(aes(fill = Gender), 
           position = "dodge", stat = "identity")

Which gives:

enter image description here

Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77