80

I'm having a very, very tough time getting the x-axis to look correct for my graphs.

Here is my data:

df <- data.frame(
  Month = factor(c(
    "2011-07-31", "2011-08-31", "2011-09-30", "2011-10-31", "2011-11-30",
    "2011-12-31", "2012-01-31", "2012-02-29", "2012-03-31", "2012-04-30",
    "2012-05-31", "2012-06-30"
  )),
  AvgVisits = c(
    6.98655104580674, 7.66045407330464, 7.69761337479304, 7.54387561322994,
    7.24483848458728, 6.32001400498928, 6.66794871794872, 7.207780853854,
    7.60281201431308, 6.70113837397123, 6.57634103019538, 6.75321935568936
  )
)

Here is the chart I am trying to graph:

ggplot(df, aes(x = Month, y = AvgVisits)) + 
  geom_bar() +
  theme_bw() +
  labs(x = "Month", y = "Average Visits per User")

That chart works fine - but, if I want to adjust the formatting of the date, I believe I should add this: scale_x_date(labels = date_format("%m-%Y"))

I'm trying to make it so the date labels are 'MMM-YYYY'

ggplot(df, aes(x = Month, y = AvgVisits)) + 
  geom_bar() +
  theme_bw() +
  labs(x = "Month", y = "Average Visits per User") +
  scale_x_date(labels = date_format("%m-%Y"))

When I plot that, I continue to get this error:

stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Despite hours of research on formatting of geom_line and geom_bar, I can't fix it. Can anyone explain what I'm doing wrong?

Edit: As a follow-up thought: Can you use date as a factor, or should you use as.Date on a date column?

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
mikebmassey
  • 8,354
  • 26
  • 70
  • 95

2 Answers2

127

To show months as Jan 2017 Feb 2017 etc:

scale_x_date(date_breaks = "1 month", date_labels =  "%b %Y") 

Angle the dates if they take up too much space:

theme(axis.text.x=element_text(angle=60, hjust=1))
  • 9
    almost went crazy, trying to figure out why this wouldn't work. Figured out, `theme_bw()` overwrites anything `theme()` does. Reordered, and it works just fine. Leaving it here in case, someone is having the same issue. – Ufos Dec 03 '17 at 18:18
  • 1
    Appreciate your answer, this helps me, write down my words to show my appreciation – cloudscomputes Jan 16 '18 at 03:36
  • 4
    This should definitely be the accepted answer to this question. The other one requires you to have the "Scales" library installed and loaded, whereas this one uses functionally that already exists in ggplot2. – AmphotericLewisAcid Jan 14 '20 at 23:43
  • @Ufos is it true that whatever is set last takes priority? – stevec Jul 28 '20 at 16:52
  • 1
    @stevec that is true for most programming languages. It's more important here that `theme_bw()` overwrites other `theme()` configurations, which could be confusing. It's only tangently related to this answer. – Ufos Aug 05 '20 at 13:06
  • 2
    @AmphotericLewisAcid ggplot2 imports scales, so if someone is using ggplot, they've already got scales installed – camille Nov 06 '21 at 02:45
  • Wow, this is great. I didn't know about `scale_*_date` before, it makes things very easy! – njp Sep 07 '22 at 00:05
89

Can you use date as a factor?

Yes, but you probably shouldn't.

...or should you use as.Date on a date column?

Yes.

Which leads us to this:

library(scales)
df$Month <- as.Date(df$Month)
ggplot(df, aes(x = Month, y = AvgVisits)) + 
  geom_bar(stat = "identity") +
  theme_bw() +
  labs(x = "Month", y = "Average Visits per User") +
  scale_x_date(labels = date_format("%m-%Y"))

enter image description here

in which I've added stat = "identity" to your geom_bar call.

In addition, the message about the binwidth wasn't an error. An error will actually say "Error" in it, and similarly a warning will always say "Warning" in it. Otherwise it's just a message.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Spot on. Thanks. I hadn't seen `stat = "identity"` before. Going to research that more now. Thanks. – mikebmassey Jul 31 '12 at 20:56
  • @mikebmassey The basic thing to remember with `stat = "identity"` is that if you're doing bar plots and you've already aggregated the data so you have the heights of each bar already, then use it. – joran Jul 31 '12 at 20:57
  • Follow-up related to a line chart for this: so this is only applicable to bar plots - I just tried to plug the same thing with a `geom_line` - with and without `stat = "identity"` - I get this warning `geom_path: Each group consist of only one observation. If I only have 1 data group, why would I need to group to make it work? Thanks – mikebmassey Jul 31 '12 at 21:03
  • @mikebmassey It's a little complicated. Basically, while it's obvious to _you_ that there's only one group, it very difficult to write the ggplot code in such a way that it can _always_ tell if that's the case with every possible data set. Hence its safer to rely on user input to explicitly specify that piece of information. – joran Jul 31 '12 at 21:08
  • 2
    `Error in date_format("%m-%Y") : could not find function "date_format"` – Peyman Mar 21 '21 at 22:48
  • 1
    @Peyman you need `library(scales)` for `date_format` – Eric Krantz Aug 31 '21 at 15:35