1

I plot here values over length for a chromosome

enter image description here

The middle region without points contains no data and should not get a loess line. How can I modify my code to stop the loess line over this region? The data is continuous but I could add lines to mark the blank region with some special value or add a column with a label?? but how to use this in the command?

my current command:

library(IDPmisc)

# plot settings (edit here)
spanv<-0.05
pointcol1="#E69F00"
pointcol2="#56B4E9"
pointcol3="#009E73"
points=20
linecol="green"
xlabs=paste(onechr, " position", " (loess-span=", spanv, ")", sep="")

data1<-NaRV.omit(data[,c(2,7)]) # keep only x and y for the relevant data 
                                # and clean NA and Inf
ylabs='E / A - ratio'
p1<-ggplot(data1, aes(x=start, y=E.R)) +
ylim(0,5) +
geom_point(shape=points, col=pointcol1, na.rm=T) +
geom_hline(aes(yintercept=1, col=linecol)) +
geom_smooth(method="loess", span=spanv, fullrange=F, se=T, na.rm=T) +
xlab(xlabs) +
ylab(ylabs)
Andrie
  • 176,377
  • 47
  • 447
  • 496
splaisan
  • 845
  • 6
  • 22

1 Answers1

6

I would do one of two things:

  1. Do the loess() fitting outside of ggplot(), predict for the two regions separately and add each set of predictions to the plot with its own geom_line() layer.
  2. Similar to the above, but this time within ggplot() realm of operations. Add two layers to the plot, not one, both using geom_smooth(), but importantly change the data argument supplied to each to refer to just one or the other portion of data.

For the latter, perhaps something like:

....
geom_smooth(data = data[1:n, ], method="loess", span=spanv, fullrange=FALSE, 
            se=TRUE, na.rm=TRUE) +
geom_smooth(data = data[m:k, ], method="loess", span=spanv, fullrange=FALSE, 
            se=TRUE, na.rm=TRUE)
....

where n and m and k refer to the indices that mark the end of set 1 and the start and end of set 2 and which need to be defined or supplied by you directly.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • thanks a lot for the answer. I will try this (the 2nd one looks nicer) – splaisan May 10 '12 at 10:07
  • The first is nicer as it involves only 1 call to `loess()` and most closely follows the the original graphic you show. For the separate `data` objects in `geom_smooth()` do realise that the second LOESS need not be the same as the fit given for the LOESS applied to the full data set (edge effects etc). Option 1 is preferable to my mind as it should be the **same** model as the one `ggplot()` fitted, you are just plotting the bits you wanted. – Gavin Simpson May 10 '12 at 11:27
  • 1
    @splaisan A third option springs to mind. Do as option 1 but predict for the whole range of data. Then set any points in the "gap" to `NA`. Then plot the predicted vector of points in a single layer. That would be another ideal way to go before choosing options 2. – Gavin Simpson May 10 '12 at 11:28
  • would be great if the gap were not regions of X where data are single x positions representing a range of 2000 wide. I have several gaps on the graph, only the central one is evident but others exist. This makes I cannot easily use the manual solution proposed by @GavinSimpson (I have 24 such graphs to plot from many different experiments and manual is not an option); I am close to it but not yet there. – splaisan May 15 '12 at 07:51