1

So I have been trying to do a boxplot of "yes/no" counts for hours now.

My dataset looks like this

> stack
         Site Plot Treatment Meters Retrieved
2   Southern    18   Control  -5.00         y
3   Southern    18   Control   9.55         y
4   Southern    18   Control   4.70         y
5   Southern    27   Control  -5.00         y
6   Southern    27   Control  20.00         n
9   Southern    18   Control  -0.10         y
17  Southern    18   Control  20.00         y
23  Southern    31   Control 100.00         y
53  Southern    25        Mu   3.55         n
54  Southern    20        Mu   5.90         y
55  Southern    25        Mu  -0.10         y
56  Southern    29        Mu   9.55         y
58  Southern    25        Mu   4.70         y
60  Southern    20        Mu   2.90         y
61  Southern    24        Mu   5.90         n
62  Southern    24        Mu   3.55         y
63  Southern    20        Mu   3.55         y
65  Southern    24        Mu   0.55         y
66  Southern    29        Mu   8.90         y
68  Southern    25        Mu   8.90         y
69  Southern    29        Mu   0.55         y
70  Southern    24        Mu   1.70         y
72  Southern    29        Mu  -5.00         y
76  Southern    29        Mu   1.70         y
77  Southern    25        Mu   9.55         y
78  Southern    25        Mu  13.20         y
79  Southern    29        Mu   3.55         y
80  Southern    25        Mu  15.00         y
81  Southern    25        Mu  -5.00         n
84  Southern    24        Mu   8.90         y
85  Southern    20        Mu   6.55         y
86  Southern    29        Mu   2.90         y
92  Southern    24        Mu  -0.10         y
93  Southern    20        Mu 100.00         y

I want to get counts of both y(yes) and n(no) of the variable "Retrieved" while grouping for "Treatment" and "Meters".

So it should look something like this

 Treatment Meters        Yes   No
     Control  -5.00         2   0
     Control   9.55         1   2
     Control   4.70         1   1
     Control  20.00         0   2
         Mu   3.55         4   0
         Mu   5.90         0   1
         Mu  -0.10         2   2
         Mu   9.55         1   0

With this data I want to do a stacked boxplot with x=Meters, y= count and Treatment as grid or something. like this

This is my code but it's not working

plot_data <- stack %>% 
  count(Retrieved, Treatment, Meters) %>% 
  group_by(Treatment, Meters) %>% 
  mutate(count= n)

plot_data

ggplot(plot_data, aes(x = Meters, y = count, fill = Treatment)) + 
  geom_col(position = "fill") + 
  geom_label(aes(label = count(count)), position = "fill", color = "white", vjust = 1, show.legend = FALSE) +
  scale_y_continuous(labels = count) 

Could you please tell me what I'm doing wrong.

lebelinoz
  • 4,890
  • 10
  • 33
  • 56
Locean
  • 15
  • 3

1 Answers1

1

geom_bar is for precisely this case, and you won't even need to use group_by or count. (From the docs: "geom_bar makes the height of the bar proportional to the number of cases in each group".)

This should do what you're looking for:

ggplot(stack, aes(x = Meters, fill = Treatment)) +
  geom_bar(position = "stack")

However, the bars will be very narrow because "Meters" is continuous and has a large range. You could address this by converting it into a factor. One way to do that would be to do this first:

data <- data %>%
  mutate(Meters = as.factor(Meters))

resulting plot

If you want to get the counts in the format that you mentioned (in addition to creating the plot), you could do:

data %>%
  count(Treatment, Meters, Retrieved) %>%
  spread(Retrieved, n, fill = 0) %>% 
  rename(Yes = y, No = n)

count does group_by for you, so I didn't need to carry that over from your code. Then, spread creates the separate columns for y and n. Finally, I rename those columns to Yes and No.

Vivid
  • 604
  • 1
  • 7
  • 11
  • Thank you! I had sort of found a way to do it but my "x=Meters" was still continuous and the graphs looked hideous. Other than that, I was using the function "count" from the plyr package `count = count(d, c('Treatment','Meters','Retrieved')) count` but that was only giving a frequency table. – Locean Jun 19 '17 at 07:17