-1

I am having a problem where geom_smooth() is not working on my ggplot. Based on previous posts, I have found that because my date variables are character vectors, the geom_smooth() is not working. I am trying to convert my dates from class character to class date but using as.Date results in a class "unknown" for my date variables.

Here is my code to attempt to fix my class type:

allmovies <- allmovies %>%
   clean_names() %>%
   select(movie, total_box_office, theatrical_release_release_date, 
     running_time, mpaa, metacritic, sentiment) %>%
   mutate(theatrical_release_release_date = 
   as.character(theatrical_release_release_date)) %>%
   mutate(theatrical_release_release_date = as.Date(theatrical_release_release_date, format = "%Y-%m-%d"))

And here is my code for attempting to plot with geom_smooth(), in case anyone can help me find the error here.

ggplotly(tooltip = c("text"),
       ggplot(data = allmovies, aes(x = theatrical_release_release_date, 
          y = total_box_office, color = mpaa, text = movie)) + 
          geom_point() +
          geom_smooth(method=lm) +
          scale_y_continuous(labels = comma) +
          labs(color = "MPAA Rating") + 
          ylab("Total Box Office Revenue") +
          xlab("Theatrical Release Date") +
          ggtitle("Total Box Office Revenue Over Time",
                  subtitle = "While revenue generally improved over time, a further analysis shows PG rated movies generated much more revenue over time while PG-13 and R-rated revenue correlations do not appear to be significant.")) %>%
       layout(title = "Total Box Office Revenue Over Time",
              font = font)

Finally, here is a sample of my data of the date column:

dput(head(allmovies$theatrical_release_release_date))
c("2013-08-23", "2013-03-22", "2012-09-14", "2012-03-16", "2012-02-17", 
"2011-10-14")

and here is a small sample of the whole data:

structure(list(movie = c("The Frozen Ground", "The Croods", "Stolen", "Seeking Justice", "Ghost Rider: Spirit of Vengeance", "Trespass" ), total_box_office = c(5617460, 573068425, 17967746, 411746, 149217355, 786532), theatrical_release_release_date = structure(c(15940, 15786, 15597, 15415, 15387, 15261), class = "Date"), running_time = c(105, 98, 96, 104, 95, 90), mpaa = c("R", "PG", "R", "R", "PG-13", "R"), metacritic = c(37, 55, 43, 38, 34, 37), sentiment = c(NA, 0.1363636, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
joran
  • 169,992
  • 32
  • 429
  • 468
  • Hi Claire. It may be helpful to give some info on the version of R you're using, any packages you have loaded and OS. – iod Nov 27 '18 at 19:03
  • 1
    Please provide a sample of your data as the problem is likely due to some characteristic of your date column. – Mako212 Nov 27 '18 at 19:06
  • @iod hello! here is what I get when i run "version": platform x86_64-apple-darwin15.6.0 arch x86_64 os darwin15.6.0 system x86_64, darwin15.6.0 version.string R version 3.5.1 (2018-07-02) I have loaded: library(shiny) library(tidyverse) library(dplyr) library(readxl) library(ggrepel) library(plotly) library(scales) library(janitor) – claireburst Nov 27 '18 at 19:09
  • @Mako212 I edited my post to add a sample! Thanks! – claireburst Nov 27 '18 at 19:13
  • Samples are better provided as text rather than images. Easiest way is to type in `dput(head(df))` and append to your answer whatever you get there. – iod Nov 27 '18 at 19:17
  • @iod sorry about that, fixed! – claireburst Nov 27 '18 at 19:19
  • I can't reproduce the error - can you include the actual error text you're getting? – iod Nov 27 '18 at 19:24
  • @iod I'm not getting an actual error, so I can't provide one. However, geom_smooth(method=lm) is not appearing on my plot and I suspect it may be due to the fact that the class for the theatrical_release_release_date variable is "unknown". – claireburst Nov 27 '18 at 19:26
  • 1
    We really don't have enough information to help. The small example of the dates you shared converts to dates just fine for me, with the appropriate class. There could be lots of other reasons the plot isn't working as you expect, but we can't really do an investigating without a full working example that we can run that demonstrates the entire behavior you're seeing. – joran Nov 27 '18 at 19:31
  • @joran hello -- I've gotten the class to convert, thank you for the feedback. i'm not sure what more info I can provide to help figure out why the geom_smooth() isn't appearing. – claireburst Nov 27 '18 at 19:36
  • Ideally, you'd provide enough of a data sample that we could run the entire bit of code you shared, including the code that produces the graph. – joran Nov 27 '18 at 19:37
  • @joran the code that produces the graph is in the post. dput(head(allmovies)) is also provided below in response to QAsena's response -- too many characters to post here. Thanks! – claireburst Nov 27 '18 at 19:40
  • 1
    I think maybe `text = movie` is causing `ggplot` to attempt to fit a model to each movie. Try removing that. – joran Nov 27 '18 at 19:44
  • @joran yes thank you that solved it! however, text = movie was allowing plotly to display movie name when the mouse tip hovers over each point. would there be any way to keep that feature without confusing geom_smooth? – claireburst Nov 27 '18 at 19:45
  • 2
    You want the text aesthetic probably only in the `geom_point` layer. When you put them in the top `ggplot()` call, every subsequent layer inherits all those aesthetic mappings. And you don't want them all in `geom_smooth()`. – joran Nov 27 '18 at 19:47
  • @joran you've solved it :) thank you so much for the help and sorry for any confusion! – claireburst Nov 27 '18 at 20:21

1 Answers1

-1

Try the lubridate package:

install.packages("lubridate")
library(lubridate)

Then use the ymd function (check the documentation for dym/ymd order by, ?dmy):

allmovies <- allmovies %>%
   clean_names() %>%
   select(movie, total_box_office, theatrical_release_release_date, 
     running_time, mpaa, metacritic, sentiment) %>%
   mutate(theatrical_release_release_date = 
   as.character(theatrical_release_release_date)) %>%
   mutate(theatrical_release_release_date = ymd(theatrical_release_release_date))

If this doesn't work to use dput to provide a sample of your data and I'll edit my answer accordingly :).

QAsena
  • 603
  • 4
  • 9
  • Hello! I've provided a sample of the data in the edited version of this post -- unfortunately, trying lubridate with "mutate(theatrical_release_release_date = ymd(theatrical_release_release_date))" still yields an "unknown" class. – claireburst Nov 27 '18 at 19:24
  • ok try now - I changed it from `dmy` to `ymd` - seems to work with the dput you provided. use `str(allmovies)` to see if the date column is character, factor or date. It should be date now :) – QAsena Nov 27 '18 at 19:29
  • Thank you, this worked -- do you have any idea, however, why my geom_smooth() does not appear on my plot? – claireburst Nov 27 '18 at 19:34
  • can you use `dput(head(allmovies))` to give a sample of the whole dataframe and I'll take a look. You only need to copy the console output :) – QAsena Nov 27 '18 at 19:36
  • structure(list(movie = c("The Frozen Ground", "The Croods", "Stolen", "Seeking Justice", "Ghost Rider: Spirit of Vengeance", "Trespass" ), total_box_office = c(5617460, 573068425, 17967746, 411746, 149217355, 786532), theatrical_release_release_date = structure(c(15940, 15786, 15597, 15415, 15387, 15261), class = "Date"), running_time = c(105, 98, 96, 104, 95, 90), mpaa = c("R", "PG", "R", "R", "PG-13", "R"), metacritic = c(37, 55, 43, 38, 34, 37), sentiment = c(NA, 0.1363636, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")) – claireburst Nov 27 '18 at 19:37
  • Thanks, seems like @joran solved this part - I didn't manage! I think you had the right idea with `as.date` but maybe it should look like this? `allmovies$theatrical_release_release_date <- as.Date(allmovies$theatrical_release_release_date, format='%m/%d')` I find lubridate easier :) – QAsena Nov 27 '18 at 20:11