2

I have a dataframe df with three columns: TASK, CONDITION, and SCORE. I want to represent the data:

  1. as barplots (I'm using geom_col)
  2. with a separate plot for each TASK (I'm using facet_wrap(~TASK))
  3. with a separate bar for each CONDITION (I'm using ggplot(df, aes(x=CONDITION)))

Additionally, the expected behavior is that, if the data of a given bar sum up to a given percentage, then that bar should be the same color as other bars that reach the same percentage. Unfortunately, I can't get that to work.

In the minimal example below, 3 bars are reaching 100%, therefore I expect them to all be blue as per the instruction high="blue" but this is not what is happening.

Input =("
TASK CONDITION SCORE
GAU   0         0.25
GAU   0         0.25
GAU   0         0.25
GAU   0         0.25
GAU   1         0.2
GAU   1         0.2
GAU   1         0.2
GAU   1         0.2
GAU   1         0.2
PLN   0         0.3333
PLN   0         0.3333
PLN   0         0
PLN   1         0.5
PLN   1         0.5
        ")
df <- read.table(textConnection(Input),
                 header=TRUE)
df$CONDITION <- factor(df$CONDITION)

library(ggplot2)
ggplot(df, aes(x=CONDITION, y=SCORE, fill=SCORE)) +
  geom_col() +
  ggtitle("Performance") +
  ylab("Total") +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(~TASK) +
  scale_fill_gradient(low="red", high="blue")
gilcail
  • 23
  • 3

1 Answers1

1

What'a really going on is a bit hidden by the plot. If we put a border on the bars and change the first value, maybe it will make it more clear

df2 <- df
df2[1, "SCORE"] <- .5
ggplot(df2, aes(x=CONDITION, y=SCORE, fill=SCORE)) +
  geom_col(color="black") +
  ggtitle("Performance") +
  ylab("Total") +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(~TASK) +
  scale_fill_gradient(low="red", high="blue")

enter image description here

It's not coloring by total height of the plot, it's coloring by each observation. Notice how your color scale was only going up to .5. If you just want to use ggplot for this, you can use a summary stat for the geom_bar to do the summation for you. It would look like this

ggplot(df, aes(x=CONDITION, y=SCORE, fill=..y..)) +
  geom_bar(stat="summary", fun.y="sum") +
  ggtitle("Performance") +
  ylab("Total") +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(~TASK) +
  scale_fill_gradient(low="red", high="blue")

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Exactly what I wanted, thank you! Looks like I should learn about "stat" and "fun" if I want to get more out of ggplot2. – gilcail Jun 20 '18 at 22:08
  • 1
    Honestly I wouldn’t bother. I rarely use them. It’s much easier to properly summarize your data with something like dplyr before plotting. – MrFlick Jun 20 '18 at 22:09
  • Hopefully I'm not pushing it but, how would you properly summarize the data in this case? – gilcail Jun 20 '18 at 22:18
  • With dplyr it would be like `df %>% group_by(TASK, CONDITION) %>% summarize_all()` (not tested because I’m not at my computer). – MrFlick Jun 20 '18 at 22:21
  • `summarize_all(sum)` returned the expected output, thanks again! – gilcail Jun 20 '18 at 22:29
  • Oops. Forgot that part. Glad you figured it out. – MrFlick Jun 20 '18 at 22:30