0

Below is the code I am having trouble with and its output. The data set is linked at the bottom of the post.

  1. What I am wanting to do is group the StateCodes together with each MSN (opposite of what is showing now in the output).
  plotdata <- EnergyData %>% 
  filter(MSN %in% c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB")) %>%
  filter(Year %in% c("2009")) %>%
  select(StateCode, MSN, Data) %>%
  group_by(StateCode) %>%
  mutate(pct = Data/sum(Data),
         lbl = scales::percent(pct))
  plotdata 

This outputs to:

Output

I thought that the group_by function would do that for me but I would like to know if I am missing a key chunk of code?

  1. Once the above chunk runs correctly, I want to create side by side Bar charts by StateCode using the percentages of each of the 5 MSN's.

Here's the code I have so far.

  ggplot(EnergyData, 
       aes(x = factor(StateCode,
                      levels = c("AZ", "CA", "NM", "TX")),
           y = pct,
           fill = factor(drv, 
                         levels = c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB"),
                         labels = c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB")))) + 
  geom_bar(stat = "identity",
           position = "fill") +
  scale_y_continuous(breaks = seq(0, 1, .2), 
                     label = pct) +
  geom_text(aes(label = lbl), 
            size = 3, 
            position = position_stack(vjust = 0.5)) +
  scale_fill_brewer(palette = "Set2") +
  labs(y = "Percent", 
       fill = "MSN",
       x = "State",
       title = "Renewable Resources by State") +
  theme_minimal()

As of now I believe this all has to do with how I create the percentages for the bar charts.

Any assistance would be great. Thank you!

Here's the data I used Energy Data http://www.mathmodels.org/Problems/2018/MCM-C/ProblemCData.xlsx

drussell
  • 3
  • 4
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data (not a picture of it), and a clear explanation of what hasn't worked with just the code necessary to help debug the issue – camille Mar 20 '20 at 20:53
  • Thank you @camille I just edited the post. Hopefully my questions come off a little clearer now! – drussell Mar 20 '20 at 21:14

1 Answers1

0

Here is a version using data.table for the initial filtering, and changes to the plot function that hopefully get you the result you are after:

library(readxl)
library(data.table)
library(ggplot2)

download.file("http://www.mathmodels.org/Problems/2018/MCM-C/ProblemCData.xlsx", "~/ex/ProblemCData.xlsx")

# by default, factor levels will be in alphabetical order, so we do not need to specify that
EnergyData <- data.table(read_xlsx("~/ex/ProblemCData.xlsx"), key="StateCode", stringsAsFactors = TRUE)

# filter by Year and MSN list
plotdata <- EnergyData[as.character(MSN) %chin% c("BMTCB", "GETCB", "HYTCB", "SOTCB", "WYTCB") & Year == 2009]

# calculate percentages of Data by StateCode
plotdata[, pct := Data/sum(Data), by = "StateCode"]

# plot using percent format and specified number of breaks
ggplot(plotdata, 
       aes(x = StateCode,
           y = pct,
           fill = MSN)) + 
    geom_bar(stat = "identity",
             position = "fill") +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1), n.breaks = 6) +
    scale_fill_brewer(palette = "Set2") +
    labs(y = "Percent", 
         fill = "MSN",
         x = "State",
         title = "Renewable Resources by State") +
    theme_minimal()

Created on 2020-03-20 by the reprex package (v0.3.0)

user12728748
  • 8,106
  • 2
  • 9
  • 14