0

I have a data frame which contains 2 weeks of data that indicate how many passengers have been on a train each day. Each observation contains 3 values, the date, the number of passengers, and the day of the week. I want to compare the passengers on each day from the previous week to this week (Monday to Monday, Tusday to Tuesday etc). Here is the data:

structure(list(total = structure(c(17455, 17456, 17457, 17458, 
17459, 17460, 17461, 17462, 17463, 17464, 17465, 17466, 17467, 
17468), class = "Date"), passengers = c(9299L, 9166L, 10234L, 
10176L, 10098L, 2867L, 5416L, 9312L, 10555L, 10858L, 10169L, 
9515L, 2679L, 5490L), dow = c("Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Saturday", "Sunday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), .Names = 
c("total", "passengers", "dow"), class = "data.frame")

(The automated system that created the reports used the term "total" for dates, I felt the need to point that out as it might be confusing).

When I create a ggplot, it only maps 1 y value for a bar chart instead of 2 side by side:

ggplot(x, aes(x=dow, y=passengers), fill=variable) + 
  geom_bar(stat = "identity", position = "dodge")

I have seen reshape used to melt the data for instances such as this, but when I melt using the day of the week as the id.vars value, the date is converted to scientific notation (small problem) but ggplot cannot find the passengers variable (big problem).

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Brad
  • 29
  • 1
  • 10

1 Answers1

2

Some issues to be addressed:

  1. you specified fill = variable, but there's no variable named "variable" in your data frame;
  2. you expect the 2 dodged bars side by side, but there's no indication how the dodging is to be done.

I would wrangle the data frame first:

library(dplyr)

df <- x %>%
  mutate(week = format(total, "%V"),
         dow = factor(dow, levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
                                      "Friday", "Saturday", "Sunday")))

> head(df)
       total passengers       dow week
1 2017-10-16       9299    Monday   42
2 2017-10-17       9166   Tuesday   42
3 2017-10-18      10234 Wednesday   42
4 2017-10-19      10176  Thursday   42
5 2017-10-20      10098    Friday   42
6 2017-10-21       2867  Saturday   42

This adds a "week" variable, which takes on the value 42 for the first 7 values, and 43 for the next 7. the days of the week are also now ordered from Mon to Sun.

ggplot(df, 
       aes(x = dow, y = passengers, fill = week)) + 
  geom_col(position = "dodge")

geom_col() is equivalent to geom_bar(stat = "identity"), but requires less typing.

plot

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • So the key is adding an identifier variable showing that there are 2 different weeks. If we indicated no fill value, would this still dodge or would the same problem persist? @Z.Lin – Brad Nov 01 '17 at 13:38
  • 1
    @Brad If you don't need different fill colours to facilitate comparison, another approach is to include `group = total` within `aes()`. This tells the package that you want each value of "total" (i.e. each day) to be treated as one group, resulting in bars dodged by day. If you don't indicate any identifier, the bars for each day would overlap at the same position (you can verify this by reducing `geom_bar`'s transparency e.g. set `alpha = 0.5`) – Z.Lin Nov 01 '17 at 13:43
  • In keeping my data frame similar to what it was before and adding a week column, I was able to create the graph. (I haven't learned dplyr yet) However, my X values were sorted differently, so the graph goes Friday, Monday, Saturday.... I'm guessing that because you ordered your "levels" argument the way you did, it kept the days of the week in order? @Z.Lin – Brad Nov 01 '17 at 14:03
  • 1
    @Brad Yes, by default ggplot orders a character variable's values alphabetically along the axis; a factor variable, on the other hand, follows the order specified by its levels. – Z.Lin Nov 01 '17 at 14:19