0

I want to make a time series plot for weekdays (i.e. excluding weekends and holidays): If I simply use ggplot with date on the x-axis and y on the y-axis the distance between a Monday and Tuesday will not be the same as the distance between Friday and Monday. There is a daily data set bellow with a date column.

df <- structure(list(PROCEDURE_DATO_DATO = structure(c(17533, 17534, 17535, 17536, 17539, 
                                                       17540, 17541, 17542, 17543, 17546, 
                                                       17547, 17548, 17549, 17550, 17553, 
                                                       17554, 17555, 17556, 17557, 17560), 
                                                     class = "Date"), 
                     Antal_akutte = c(17, 31, 22, 18, 25,
                                      26, 20, 20, 21, 19, 
                                      25, 26, 27, 14, 14, 
                                      39, 21, 23, 20, 13), 
                     Antal_besog = c(42L, 60L, 58L, 58L, 56L, 
                                     61L, 44L, 48L, 47L, 44L, 
                                     58L,60L, 58L, 45L, 38L, 
                                     73L, 49L, 50L, 53L, 40L), 
                     Andel = c(0.404761904761905, 0.516666666666667, 0.379310344827586, 
                               0.310344827586207, 0.446428571428571, 0.426229508196721, 
                               0.454545454545455, 0.416666666666667, 0.446808510638298, 
                               0.431818181818182, 0.431034482758621, 0.433333333333333, 
                               0.46551724137931, 0.311111111111111, 0.368421052631579, 
                               0.534246575342466, 0.428571428571429, 0.46, 0.377358490566038, 0.325)), 
                .Names = c("PROCEDURE_DATO_DATO", "Antal_akutte", "Antal_besog", "Andel"), 
                row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

If I simply make a row_number then I loose the dates on the axis. How can I use the row number, but label the axis with the date column?

df %>% 
  mutate(row = row_number()) %>% 
  ggplot(aes(row, Antal_akutte)) +
  geom_line()

enter image description here If I try to create a label with scale_x_continues I get an error:

data %>% 
  mutate(row = row_number(), 
         PROCEDURE_DATO_DATO = as.character(PROCEDURE_DATO_DATO)) %>%
  ggplot(aes(row, Antal_akutte)) +
    geom_line() +
    scale_x_continuous(labels = seq.Date(as.Date("2018-01-02"), as.Date("2018-12-31"), by = "q"))

Error in f(..., self = self) : Breaks and labels are different lengths

OTStats
  • 1,820
  • 1
  • 13
  • 22
xhr489
  • 1,957
  • 13
  • 39
  • 2
    No. A time series is continuous—every day is included. You could instead create a column of observation numbers or some other way to assign numbers continuously for just weekdays, then set labels however you want. It's unclear how you're trying to label this or build the plot, or what type of plot you want, since none of that code is included here – camille Jan 28 '19 at 16:06
  • @ camilie: Thanks, at first I also simply made a row number and used it for my x-axis, but then I did not have the labels. – xhr489 Jan 28 '19 at 16:11
  • You can set labels, such as with `scale_x_continuous`. But again, without the code you're using, it's hard to help specifically – camille Jan 28 '19 at 16:14

1 Answers1

0

You can transform your data to an eXtensible time series (xts) object which makes work with time series pretty easy. Then use autoplot to plot the xts object using ggplot2:

# load libraries
library(ggplot2)
library(xts)

# create an xts object (an xts object is formed by the matrix of observations, ordered by an index of dates - in your case `df$PROCEDURE_DATO_DATO`)
df_xts <- xts(df[,-1], order.by =df$PROCEDURE_DATO_DATO)

# make the plot
autoplot(df_xts, geom="line") 

Let's plot a few observations including the first weekend from January:

> df_xts[3:6,]
           Antal_akutte Antal_besog     Andel
2018-01-04           22          58 0.3793103
2018-01-05           18          58 0.3103448
2018-01-08           25          56 0.4464286
2018-01-09           26          61 0.4262295

I'll use geom = "point" to clearly indicate the missing data points during the weekend.

autoplot(df_xts[3:6,], geom="point"):

enter image description here

Update: To plot without the weekend dates, your solution should work:

df <- df[3:6,] %>% mutate(row=row_number(), PROCEDURE_DATO_DATO=as.character(PROCEDURE_DATO_DATO))

ggplot(df, aes(row, Antal_akutte)) + geom_line() + scale_x_continuous(breaks = df$row, labels=df$PROCEDURE_DATO_DATO)

enter image description here

NRLP
  • 568
  • 3
  • 16
  • @ Thanks Lumina, I was trying to make it work with xts before but I could not make pretty plots with plot() from base r. But I don't want gaps in the time axis and I would like "one" day between jan 5 and jan 8. How do people plot daily stock prices? – xhr489 Jan 28 '19 at 18:41
  • @ Lumina: well I guess the correct way is that the space between Friday and Monday is the actuall timespan and not like e.g. Monday and Tuesday. But is this not in contradiction that a time series is equally spaced? Or would you say that my time series is equally spaced but with missing values for the weekends? – xhr489 Jan 28 '19 at 19:05
  • @ Luminta: I don't understand why I should use an xts object. Why not just use ``df[3:6, ] %>% ggplot(aes(x=PROCEDURE_DATO_DATO)) + geom_line(aes(y=Antal_besog), color="blue")`` – xhr489 Jan 29 '19 at 08:53
  • @David: I find it cleaner to work with xts/zoo objects when time series are involved. Particularly because of how easy it is to manipulate the time index (for instance, in case you need to pre-process your raw data to remove the weekends, `.indexwday` would be my first choice for that). However, a time series *is* continuous (it includes all days). Check out this answer for a way to remove the weekends with ggplot2 (but it uses facets and this may not be what you want): https://stackoverflow.com/questions/14136703/ggplot2-time-series-plotting-how-to-omit-periods-when-there-is-no-data-points – NRLP Jan 29 '19 at 09:12
  • @David: Another library with nice plotting for time series is `PerformanceAnalytics` - see `chart.TimeSeries()`, but again, it will include all days in the plot. I am not sure what is a clean way to do this in ggplot2. I see that YahooFinance uses ChartIQ for their plots.You may need to consider other tools? – NRLP Jan 29 '19 at 09:22
  • 1
    @ Luminita: Thanks, I have heard about ``PerformanceAnalytics`` but have not looked into it yet. The way I filtered weekends is i created a ``wday`` with ``lubridate`` and then simply made a basic ``dplyr`` filter. Anyway thanks, my problem was that I thought it was "wrong" for my plot to have extra space for for the weekends. My reasoning was that when one sees a daily plot of stock prices one does not see a gap for the weekend. That is it seems that the space between Friday and Monday is the same as the space between Monday and Tuesday. Maybe I am over thinking the issue... Thanks again. – xhr489 Jan 29 '19 at 09:35