1

I have a table of maximum trip lengths by month which I am trying to graph in R , enter image description here While trying to graph it, the X-axis does not graph according to the month, instead it graphs it alphabetically enter image description here

I'm just getting started in R and I used the following code from what one of the videos I watched adjusted for my table names:

max_trips <- read.csv("max_and_min_trips.csv")

ggplot(data=max_trips)+
  geom_point(mapping = aes(x=month,y=max_trip_duration))+
  scale_x_month(month_labels = "%Y-%m")

  • 2
    Check `str(max_trips)`, it's ordered alphabetically because your columns (both month and duration, guessimg from alignment to left on screenshot) are strings. You can try to switch from read.csv to readr:: read_csv to see if it can guess column types correctly for you. For parsing dates check `my()` (month-year) from lubridate - https://lubridate.tidyverse.org/reference/ymd.html – margusl Nov 08 '22 at 06:34
  • Please add reproducible data using `dput`. – Shubham Nov 08 '22 at 12:56

1 Answers1

0

The simple answer is that the data for your "month" column is stored as a vector of strings, not as a date. In R, this data type is called a "character" (or chr). You can confirm this by typing class(max_trips$month). The result is certainly "character" in your console. Therefore, your solution would be to (1) convert the data type to a date and (2) adjust the formatting of the date on the x axis using scale_x_date and/or related functions.

I'll demonstrate the process with a simple example dataset and plot. Here's the basic data frame and plot. You'll see, the plot is again arranged "alphabetically" instead of as expected if the mydf$dates values were stored as dates in "month/year" format.

library(lubridate)
mydf <- data.frame(
  dates = c("1/21", "2/20", "12/21", "3/19", "10/19", "9/19"),
  yvals = c(13, 31, 14, 10, 20, 18))
ggplot(mydf, aes(x = dates, y = yvals)) + geom_point()

enter image description here

Convert to Date

To convert to a date, you can use a few different functions, but I find the lubridate package particularly useful here. The as_date() function will be used for the conversion; however, we cannot just apply as_date() directly to mydf$dates or we will get the following error in the console:

> as_date(mydf$dates)
[1] NA NA NA NA NA NA
Warning message:
All formats failed to parse. No formats found. 

Since there are so many variety of ways you can format data which correspond to dates, date times, etc, we need to specify that our data is in "month/year" format. The other key here is that data setup as a date must specify year, month and day. Our data here is just specifying month and year, so we will first need to add a random "day" to each date before converting. Here's something that works:

mydf$dates <- as_date(
  paste0("1/", mydf$dates),  # need to add a "day" to correctly format for date
  format = "%d/%m/%y"        # nomenclature from strptime()
)

The paste0(...) function serves to add "1/" before each value in mydf$dates and then the format = argument specifies the character values should be read as "day/month/year". For more information on the nomenclature for formats of dates, see the help for the strptime() function here.

Now our column is in date format:

> mydf$dates
[1] "2021-01-01" "2020-02-01" "2021-12-01" "2019-03-01" "2019-10-01" "2019-09-01"
> class(mydf$dates)
[1] "Date"

Changing Date Scale

When plotting now, the data is organized in the proper order along a date scale x axis.

p <- ggplot(mydf, aes(x = dates, y = yvals)) + geom_point()
p

enter image description here

If the labeling isn't quite what you are looking for, you may check the documentation here for the scale_x_date() function for some suggestions. The basic idea is to setup the arguments for breaks= in your scale and how they are labeled with date_labels=.

p + scale_x_date(breaks="4 months", date_labels = "%m/%y")

enter image description here

In the OP's case, I would suggest the following code should work:

library(lubridate)

max_trips <- read.csv("max_and_min_trips.csv")
max_trips$month <- as_date(
  paste0("1/", max_trips$month),
  format = "%d/%m/%y")

ggplot(data=max_trips)+
  geom_point(mapping = aes(x=month,y=max_trip_duration))+
  scale_x_date(breaks = "1 month", date_labels = "%Y-%m")
chemdork123
  • 12,369
  • 2
  • 16
  • 32