Please refer to the dput of data . You may directly scroll down to objective and problem statement. Maybe you don't need data as you could have encountered this problem earlier.
Calling required libraries
library(zoo)
library(ggplot2)
library(scales)
library(plotly)
library(ggthemes)
library(forecast)
library(plotly)
library(DescTools)
dput of data
dput(ridership.ts)
structure(c(1709L, 1621L, 1973L, 1812L, 1975L, 1862L, 1940L,
2013L, 1596L, 1725L, 1676L, 1814L, 1615L, 1557L, 1891L, 1956L,
1885L, 1623L, 1903L, 1997L, 1704L, 1810L, 1862L, 1875L, 1705L,
1619L, 1837L, 1957L, 1917L, 1882L, 1933L, 1996L, 1673L, 1753L,
1720L, 1734L, 1563L, 1574L, 1903L, 1834L, 1831L, 1776L, 1868L,
1907L, 1686L, 1779L, 1776L, 1783L, 1548L, 1497L, 1798L, 1733L,
1772L, 1761L, 1792L, 1875L, 1571L, 1647L, 1673L, 1657L, 1382L,
1361L, 1559L, 1608L, 1697L, 1693L, 1836L, 1943L, 1551L, 1687L,
1576L, 1700L, 1397L, 1372L, 1708L, 1655L, 1763L, 1776L, 1934L,
2008L, 1616L, 1774L, 1732L, 1797L, 1570L, 1413L, 1755L, 1825L,
1843L, 1826L, 1968L, 1922L, 1670L, 1791L, 1817L, 1847L, 1599L,
1549L, 1832L, 1840L, 1846L, 1865L, 1966L, 1949L, 1607L, 1804L,
1850L, 1836L, 1542L, 1617L, 1920L, 1971L, 1992L, 2010L, 2054L,
2097L, 1824L, 1977L, 1981L, 2000L, 1683L, 1663L, 2008L, 2024L,
2047L, 2073L, 2127L, 2203L, 1708L, 1951L, 1974L, 1985L, 1760L,
1771L, 2020L, 2048L, 2069L, 1994L, 2075L, 2027L, 1734L, 1917L,
1858L, 1996L, 1778L, 1749L, 2066L, 2099L, 2105L, 2130L, 2223L,
2174L, 1931L, 2121L, 2076L, 2141L, 1832L, 1838L, 2132L), .Tsp = c(1991,
2004.16666666667, 12), class = "ts")
Creating data frame of ts object to use ggplot
tsd = data.frame(time = as.Date(ridership.ts),
value = as.matrix(ridership.ts))
Building linear model
ridership.lm <- tslm(ridership.ts ~ trend + I(trend^2))
Adding new column to existing data frame tsd
tsd$linear_fit = as.matrix(ridership.lm$fitted.values)
Defining length of validation and training period
nValid = 36
nTrain = length(ridership.ts) - nValid
Training data
train.ts = window(ridership.ts,
start = c(1991, 1),
end = c(1991, nTrain))
validation data
valid.ts = window(ridership.ts,
start = c(1991, nTrain + 1),
end = c(1991, nTrain + nValid))
Building model
ridership.lm = tslm(train.ts ~ trend + I(trend^2))
Forecasting using our build model
ridership.lm.pred = forecast(ridership.lm, h = nValid, level = 0)
Making dataframe for the fitted model values
tsd_train_model = data.frame(time = as.Date(train.ts),
lm_fit_train = as.matrix(ridership.lm$fitted.values))
Making dataframe for plotting purpose
forecast_df = data.frame(time = as.Date(valid.ts),
value = as.matrix(ridership.lm.pred$mean))
Creating plot using ggplot
p1 = ggplot(data = tsd,
aes(x = time, y = value)) +
geom_line(color = 'blue') +
ylim(1300, 2300) +
geom_line(data = tsd_train_model,
aes(x = time, y = lm_fit_train),
color = 'red')
p2 = p1 +
geom_line(data = forecast_df,
aes(x = time, y = value),
col = 'red', linetype = 'dotted') +
scale_x_date(breaks = date_breaks('1 years'),
labels = date_format('%b-%y')) +
geom_vline(xintercept = as.numeric(c(tsd_train_model[NROW(tsd_train_model), ]$time, #last date of training period
forecast_df[NROW(forecast_df), ]$time))) #last date of testing period
p3 = p2 +
annotate('text',
x = c(tsd_train_model[NROW(tsd_train_model)/2, ]$time,
forecast_df[NROW(forecast_df) / 2,]$time),
y = 2250,
label = c('Training Period', 'Validation Period'))
Objective: I want to add forecast error of 5 percentile and 95 percentile on both side of predicted line (dotted red in this figure) and shade the region.
I used quantile for producting forecast range
q = quantile(ridership.lm.pred$residuals, c(.05, .95))
percentile_5 = as.numeric(q[1])
percentile_95 = as.numeric(q[2])
Add 5 percentile and 95 percentile to the forecast data
yl = forecast_df$value + percentile_5
ym = forecast_df$value + percentile_95
Problem: If I use the below command then it is not displaying the shaded region for the complete validation period.
p3 + geom_ribbon(data = forecast_df,
aes(ymin = yl,
ymax = ym),
fill="gray30")
NROW(yl)
[1]36
sum(is.na(yl))
[1] 0
NROW(ym)
[1] 36
sum(is.na(ym))
[1] 0
Things Tried: If I replace the value of ymin and ymax by any other value for example If I use the below command then I get the figure shown just below the command
p3 + geom_ribbon(data = forecast_df,
aes(ymin = rep(1750,36),
ymax = rep(2000,36),
fill="gray30"))
My Questions:
Can anyone please tell me the reason behind the output in figure 2? Why R is giving strange output as in figure 2?
Can anyone please help me to shade the complete region using ggplot?