Problems with ggplot and pgfSweave

Question

I started using Sweave some time ago. However, like most people I encountered pretty soon a major problem: Speed. Sweaving a large document takes ages to run, which makes efficient working quite challenging. Data processing can be accelerated very much with cacheSweave. However, plots - especially ggplot ;) - still take too long to render. That’s way I want to use pgfSweave.

After many, many hours, I finally succeeded in setting up a working system with Eclipse/StatET/Texlipse. I then wanted to convert an existing report to use with pgfSweave and had a bad surprise: most of my ggplots doesn’t seem to work anymore. The following plot for example works perfectly in the console and Sweave:

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

Running it with pgfSweave, however, I get this error:

Error in if (width > 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (width > 0) { :
  the condition has length > 1 and only the first element will be used
Error in driver$runcode(drobj, chunk, chunkopts) : 
  Error in if (width > 0) { : missing value where TRUE/FALSE needed

When I remove aes(...) from geom_point, the plot works perfectly with pgfSweave.

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)

Edit: I investigated more into the problem and could reduce the problem to the tikz-device.

This works just fine:

quartz()
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

This gives the above error:

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)
dev.off()

This works just fine as well:

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)
dev.off()

I could repeat this with 5 different ggplots. When not using colour (or size, alpha,...) in the mapping, it works with tikz.

Q1: Does anybody has any explanations for this behavior?

Additionally, caching of non-plot code chunks doesn’t work very well. The following code chunk takes no time at all with Sweave. With pgfSweave, it takes approximately 10 sec.

<<plot.opts,echo=FALSE,results=hide,cache=TRUE>>=
#colour and plot options are globally set
pal1 <- brewer.pal(8,"Set1")
pal_seq <- brewer.pal(8,"YlOrRd")
pal_seq <- c("steelblue1","tomato2")
opt1 <- opts(panel.grid.major = theme_line(colour = "white"),panel.grid.minor = theme_line(colour = "white"))
sca_fill_cont_opt <- scale_fill_continuous(low="steelblue1", high="tomato2")
ory <- geom_hline(yintercept=0,alpha=0.4,linetype=2) 
orx <- geom_vline(xintercept=0,alpha=0.4,linetype=2)
ts1 <- 2.3
ts2 <- 2.5
ts3 <- 2.8
ps1 <- 6
offset_x <- function(x,y) 0.15*x/pmax(abs(x),abs(y))
offset_y <- function(x,y) 0.05*y/pmax(abs(x),abs(y))
plot_size <- 50*50

This seems a pretty strange behavior as well, as only some variables are set for later use.

Q2: Anybody got any explanations for that?

Q3: More generally, I would like to ask if anybody at all is using pgfSweave successfully? With successfully I mean that all things that work in Sweave also work in pgfSweave, with the additional benefit of nice fonts and improved speed. ;)

Thanks very much for responses!

I don't use sweave so can't comment on that but I can comment that ggplot is slow, this is a known issue while plotting data with > 1000 points (sometimes less). If you're looking for "faster" graphing try Lattice or Base Graphics. They won't be as pretty out of the box though. — Brandon Bertelsen, Nov 17 '10 at 23:09
... and normally, speed isn't that much an issue when plotting. Unless you want to use Sweave... ;) — donodarazao, Nov 18 '10 at 18:51
Hi donodarazo, I am one of the authors of the tikzDevice. I will try to reproduce your ggplot problems to see if there is a fix. If you could save the `elevation`, `area` and `que_id` to an RData file and send a download link to the email address listed in the package entry on CRAN it would help. I will also forward this question to Cameron---he may have some ideas concerning the pgfSweave issues. — Sharpie, Nov 18 '10 at 21:20

Sharpie · Accepted Answer · 2010-11-19T15:49:48.870

Q1: Does anybody have any explanations for this behavior?

These are three reasons behind why tikzDevice gives an error when trying to construct your plot:

When you add an aesthetic mapping that creates a legend, such as aes(colour=que_id), ggplot2 will use the variable name as the title of the legend---in this case, que_id.
The tikzDevice passes all strings, such as legend titles, to LaTeX for typesetting.
In LaTeX the underscore character, _, is used to denote a subscript. If an underscore is used outside of math mode, it causes an error.

When the tikzDevice tries to calculate the height and width of the legend title, "que_id", it passes the string to LaTeX for typesetting and expects LaTeX to return the width and height of the string. LaTeX suffers an error because there is an unescaped underscore used in the string outside of mathmode. The tikzDevice receives a NULL for the string width instead of a number which causes an if (width > 0) check to fail.

Ways to avoid the problem

Specify a legend title to use by adding a color scale:

p1 <- ggplot(plot_info, aes(elevation, area))
p1 <- p1 + geom_point(aes(colour=que_id))


# Add a name that is easier for humans to read than the variable name
p1 <- p1 + scale_colour_brewer(name="Que ID")


# Or, replace the underscore with the appropriate LaTeX escape sequence
p1 <- p1 + scale_colour_brewer(name="que\\textunderscore id")

Use the string sanitization feature introduced in tikzDevice 0.5.0 (but was broken until 0.5.2). Currently, string sanitization will only escape the following characters: %, $, {, }, and ^ by default. However, you can specify additional substitution pairs via the tikzSanitizeCharacters and tikzReplacementCharacters options:

# Add underscores to the sanitization list
options(tikzSanitizeCharacters = c('%','$','}','{','^', '_'))
options(tikzReplacementCharacters = c('\\%','\\$','\\}','\\{',
  '\\^{}', '\\textunderscore'))


# Turn on string sanitization when starting the plotting device
tikz('myPlot.tex', standAlone = TRUE, sanitize = TRUE)
print(p1)
dev.off()

We will be publishing version 0.5.3 of the tikzDevice in the next couple of weeks in order to address some annoying warning messages that now show up due to changes in the way R handles system(). I will add the following changes to this next version:

Better warning message when width is NULL indicating that there is probably something wrong with plot text.
Add underscores and a few other characters to the default set of characters that the string sanitizer looks for.

Hope this helps!

Nice! I applied names([df]) <- gsub("_",".",names([df])) to all data frames after reading them and adopted the report. The '_' was because the data was exported from MS Access where no '.' in field names are possible. Now, it works fine... there's still a lot of tweaking to do, but technically, everything is good. Thanks for the support and for looking into my data! :) — donodarazao, Nov 19 '10 at 15:48

score 3 · Answer 2 · answered Nov 18 '10 at 23:47

Q2: I am the maintainer of pgfsweave.

Here are the results of a test I ran:

time R CMD Sweave time-test.Rnw 

real    0m1.133s
user    0m1.068s
sys     0m0.054s

time R CMD pgfsweave time-test.Rnw 

real    0m2.941s
user    0m2.413s
sys     0m0.364s

time R CMD pgfsweave time-test.Rnw 

real    0m2.457s
user    0m2.112s
sys     0m0.283s

I believe the there are 2 reasons for the time difference but it would take more work to verify them exactly:

pgfSweave does a ton of checking and double checking to make sure that it is not redoing expensive computations. The goal is to make it feasible to do more expensive calculations and plotting within a document. The scale of "expensive" in this case is much more than the additional second or two to do checks.

As an example of the caching consider the following test file to see the real benefits of caching:

\documentclass{article}

\begin{document}

<<plot.opts,cache=TRUE>>=
x <- Sys.sleep(10)
@

\end{document}

And the results:

time R CMD Sweave time-test2.Rnw 

real    0m10.334s
user    0m0.283s
sys     0m0.047s

time R CMD pgfsweave time-test2.Rnw 

real    0m12.032s
user    0m1.356s
sys     0m0.349s

time R CMD pgfsweave time-test2.Rnw 

real    0m1.423s
user    0m1.121s
sys     0m0.266s

Sweave has undergone some changes in R 2.12. The changes may have sped up the process of code chunk evaluation and left pgfSweave behind for these smaller calculations. Worth looking into

Q3: I use pgfSweave myself all the time for my own work. There have been some changes in Sweave in R 2.12 that have been causing some minor problems with pgfSweave but a new version is forthcoming that fixes everything. The development version on github ( https://github.com/cameronbracken/pgfSweave) already has the changes. If you are having additional problems I would be happy to help.

score 1 · Answer 3 · answered Nov 18 '10 at 12:45

1

Q2: Do you use \pgfrealjobname{<DOCUMENTNAME>} in the header and option external=TRUE for the graphics chunks? I've found that that increases the speed a lot (not for the first compilation, but for subsequent ones if the graphics are unchanged). You'll find more background in the pgfSweave vignette.

Q3: Everything works fine for me, I use Windows + Eclipse/StatEt/Texlipse like you.

answered Nov 18 '10 at 12:45

fabians

3,383
23
23

Thanks for the answer. Q2: Yes, I used pgfrealname{} and external=TRUE. Anyway, the speed issue in Q2 was with a none-graphical chunk. Q3: It's nice to hear that apparently it really IS possible to configure everything in a satisfying way... I guess I just have to do some more trying to get there. ;) – donodarazao Nov 18 '10 at 18:52

Problems with ggplot and pgfSweave

3 Answers3