1

I am using ggplot geom_smooth to plot turnover data of a customer group from previous year against the current year (based on calendar weeks). As the last week is not complete, I would like to use a dashed linetype for the last week. However, I can't figure out how to that. I can either change the linetype for the entire plot or an entire series, but not within a series (depending on the value of x):

target graph

To keep it simple, let's just use the following example:

set.seed(42)
frame <- data.frame(series = rep(c('a','b'),50),x = 1:100, y = runif(100))

ggplot(frame,aes(x = x,y = y, group = series, color=series)) + 
geom_smooth(size=1.5, se=FALSE)

How would I have to change this to get dashed lines for x >= 75?

The goal would be something like this: example

Thx very much for any help!

Edit, 2016-03-05

Of course I fail when trying to use this method on the original plot. The Problem lies with the ribbon, which is calculated using stat_summary and a predefined function. I tried to use use stat_summary on the original data (mdf), and geom_line on the smooth_data. Even when I comment out everything else, I still get "Error: Continuous value supplied to discrete scale". I believe the problem comes from the fact that the original x value (Kalenderwoche) was discrete, whereas the new, smoothed x is continuous. Do I have to somehow transform one into the other? What else could I do?

Here is what I tried (condensed to the essential lines):

quartiles <- function(x) {  
  x <- na.omit(x) # remove NULL
  median <- median(x)
  q1 <- quantile(x,0.25)
  q3 <- quantile(x,0.75)
  data.frame(y = median, ymin = median, ymax = q3)
}

g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
geom_smooth(size=1.5, method="auto", se=FALSE)

# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]

ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
  geom_line(data=smooth_data, aes(x=x, y=y, group=group, colour=group, fill=group))  

mdf looks like this:

str(mdf)
'data.frame':   280086 obs. of  5 variables:
 $ konto_id     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Kalenderwoche: Factor w/ 14 levels "2015-48","2015-49",..: 4 12 1 3 7 13 10 6 5 9 ...
 $ variable     : Factor w/ 2 levels "Umsatz","Umsatz Vorjahr": 1 1 1 1 1 1 1 1 1 1 ...
 $ value        : num  0 428.3 97.8 76 793.1 ...

There are many accounts (konto_id), and for each account and calendar week (Kalenderwoche), there is a current turnover value (Umsatz) and a turnover value from last year (Umsatz Vorjahr). I can provide a smaller version of the data.frame and the entire code, if required.

Thx very much for any help!

P.S. I am a total novice in R, so my code probably looks rather stupid to pros, sorry for that :(

Edit, 2016-03-06

I have uploaded a subset of the data (mdf): mdf

The full code of the original graph is the following (looking somewhat weird with so little data, but that's not the point ;)

library(dtw)
library(reshape2)
library(ggplot2)
library(RODBC)
library(Cairo)

# custom breaks for X axis
breaks.custom <- unique(mdf$Kalenderwoche)[c(TRUE,rep(FALSE,0))] 

# function called by stat_summary
quartiles <- function(x) {  
  x <- na.omit(x)
  median <- median(x)
  q1 <- quantile(x,0.25)
  q3 <- quantile(x,0.75)
  data.frame(y = median, ymin = median, ymax = q3)
}

# Positions for guidelines and labels
horizontal.center <- (length(unique(mdf$Kalenderwoche))+1)/2
kw.horizontal.center <- as.vector(sort(unique(mdf$Kalenderwoche))[c(horizontal.center-0.5,horizontal.center+0.5)])
vpos.P75.label <- max(quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]],0.75)
                      ,quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]],0.75))+10
# use the higher P75 value of the two weeks around the center
vpos.mean.label <- min(mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
                       ,mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10
vpos.median.label <- min(median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
                         ,median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10

hpos.vline <- which(as.vector(sort(unique(mdf$Kalenderwoche))=="2016-03"))

# custom colour palette (2 colors)
cbPaletteLine <- c("#DA2626", "#2626DA")
cbPaletteFill <- c("#F0A8A8", "#7C7CE9")


# ggplot
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
  geom_smooth(size=1.5, method="auto", se=FALSE)+ 
  # SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
  scale_x_discrete(breaks=breaks.custom)+
  scale_colour_manual(values=cbPaletteLine)+
  scale_fill_manual(values=cbPaletteFill)+
  #coord_cartesian(ylim = c(0, 250)) +
  theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
  #scale_color_brewer(palette="Dark2")+
  labs(title = "Tranche 1", x =  "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
  geom_vline(xintercept = hpos.vline, linetype=2)+
  annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
  annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+  
  annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
  theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))

Edit, 2016-03-06

The final plot now looks like this (thx, Jason!!) enter image description here

Pesche Helfer
  • 506
  • 1
  • 7
  • 17

1 Answers1

5

I am not so sure how to smooth all data and use different line types for subsets by geom_smooth function. My idea is to pull out the data which ggplot used to construct the plot and use geom_line to reproduce it. This was the way I did it:

set.seed(42)
frame <- data.frame(series=rep(c('a','b'), 50),
                    x = 1:100, y = runif(100))
library(ggplot2)
g <- ggplot(frame, aes(x=x, y=y, color=series)) + geom_smooth(se=FALSE) 

# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]
ggplot(smooth_data[smooth_data$x <= 76, ], aes(x=x, y=y, color=as.factor(group), group=group)) +
  geom_line(size=1.5) +
  geom_line(data=smooth_data[smooth_data$x >= 74, ], linetype="dashed", size=1.5) +
  scale_color_discrete("Series", breaks=c("1", "2"), labels=c("a", "b"))

enter image description here

You're right. The problem is that you add a continuous x to a discrete x in the original layer. One way to deal with it is to create a lookup table which in this case, it is easy because x is a sequence from 1 to 14. We can transform discrete x by indexing. In your code, it should work if you add:

level <- levels(mdf$Kalenderwoche)
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25) +
  geom_line(data=smooth_data, aes(x=level[x], y=y, group=group, colour=as.factor(group), fill=NA)) 

Here is my attempt for the question:

g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable)) +
  geom_smooth(size=1.5, method="auto", se=FALSE) + 
  # SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)    

smooth_data <- ggplot_build(g)$data[[1]]
ribbon_data <- ggplot_build(g)$data[[2]]    

# Use them as lookup table
level <- levels(mdf$Kalenderwoche)
clevel <- levels(mdf$variable)    

ggplot(smooth_data[smooth_data$x <= 13, ], aes(x=level[x], y=y, group=group, color=as.factor(clevel[group]))) +
  geom_line(size=1.5) + 
  geom_line(data=smooth_data[smooth_data$x >= 13, ], linetype="dashed", size=1.5) +
  geom_ribbon(data=ribbon_data,
              aes(x=x, ymin=ymin, ymax=ymax, fill=as.factor(clevel[group]), color=NA), alpha=0.25) +
  scale_x_discrete(breaks=breaks.custom) +
  scale_colour_manual(values=cbPaletteLine) +
  scale_fill_manual(values=cbPaletteFill) +
  #coord_cartesian(ylim = c(0, 250)) +
  theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
  #scale_color_brewer(palette="Dark2")+
  labs(title = "Tranche 1", x =  "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
  geom_vline(xintercept = hpos.vline, linetype=2)+
  annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
  annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+  
  annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
  theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))

enter image description here

Note that the legend has borderline.

JasonWang
  • 2,414
  • 11
  • 12
  • Interesting solution, thank you very much! I try to port this to the somewhat more complex original graph. If I fail, I will come back again :) – Pesche Helfer Mar 05 '16 at 17:03
  • stat_smooth() uses loess to smooth a dataset of this size, if you wanted to use the same smooth function throughout you could smooth the data using locfit or loess and the plot the smoothed data as separate line graphs broken as above. – admccurdy Mar 05 '16 at 17:05
  • No problem, @PescheHelfer! – JasonWang Mar 05 '16 at 17:21
  • Of course, I utterly failed when trying to apply this to the original graph (see edit). Any further help would be greatly appreciated! Thank you very much :) – Pesche Helfer Mar 05 '16 at 19:57
  • @Jason: although of course your solution works perfectly well for the example, I removed the "solved" tag from your answer, because otherwise the question would be considered solved and I would probably not get anymore help. Sorry :( – Pesche Helfer Mar 05 '16 at 20:41
  • Can you provide us with a subset of your data frame? I tried to rebuild it, but it's quite hard. Thanks! – JasonWang Mar 05 '16 at 22:19
  • Yes, of course. I have also added the full code of the original graph (see Edit, 2016-03-06). Thx very much for your help! – Pesche Helfer Mar 06 '16 at 03:44
  • @PescheHelfer, I updated my code. You can check it and feel free to ask if you have any problem. The trick is similar to the one I posted earlier. (I don't know the borderline of legend will be a problem or not, but I haven't figured it yet.) – JasonWang Mar 06 '16 at 06:43
  • @Jason, perfect, that's amazing, thanks a ton! The borderline around the legend looks cool, actually, so no problem there :) I have one more question, to which you might know the answer: Is there a way to change the scaling of the linetype independantly of the thickness of the line? (of course I can use dotted instead of dashed, but just as a general question). I couldn't figure that one out myself, neither. – Pesche Helfer Mar 06 '16 at 15:12
  • @PescheHelfer, I confused about the "scaling of the linetype", but the answer probably would be on the ggplot2 website: http://docs.ggplot2.org/current/aes_linetype_size_shape.html. – JasonWang Mar 06 '16 at 16:47
  • @Jason, I meant to make smaller or larger gaps between the dashes. I tried different things (also checked the website), but couldn't make anything work. However, when using the Cairo device, the scaling seems to be somewhat different and the dashes now look as intended (more like dotted, but that's fine). I also got rid of the boxes around the legend, because on my machine it also created diagonals. To do so, I modified the code as follows: – Pesche Helfer Mar 06 '16 at 17:11
  • geom_ribbon(data=ribbon_data, show_guide=FALSE, # prevents boxes and diagonals around the legends (also removes the color boxes, though) aes(x=x, ymin=ymin, ymax=ymax, fill=clevel[group], color=NA), alpha=0.25) + – Pesche Helfer Mar 06 '16 at 17:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/105497/discussion-between-jasonwang-and-pesche-helfer). – JasonWang Mar 06 '16 at 17:39