15

I want to produce an html document using knitr/rmarkdown. Currently, the file is over 20MB and I'm trying to find a way to reduce it. The large file size is probably due to my plots which have a lot of points in them.

If I change my output type to pdf, I can get it down to 1.7MB. I'm wondering if there is a way to reduce my file while keeping it as a html.

EDIT: Here's a minimal working example which I did in RStduio.

---
title: "Untitled"
author: "My Name"
date: "September 7, 2015"
output: html_document
---

```{r}
library(ggplot2)
knitr::opts_chunk$set(dev='svg')
```

```{r}
set.seed(1)
mydf <- data.frame(x=rnorm(2e4),y=rnorm(2e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```

I also noticed that if I have too many observations, the plot doesn't get generated at all. I just get an empty box with a question mark in the output.

```{r}
set.seed(2)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
# ...plot doesn't appear in output

```

Maria Reyes
  • 369
  • 3
  • 18
  • 3
    Scatter plots with **many** observations can be heavier in vector format compared to a raster image, but if that's not an issue with your current project, you might try SVG or lower resolution images. – daroczig Sep 07 '15 at 05:41
  • @daroczig I tried svg and it did help a bit, but could you describe how to lower the resolution because I'd like to reduce my file size further and I don't mind if I have to sacrifice image quality. I edited my post by adding a minimum working example. – Maria Reyes Sep 07 '15 at 17:04
  • If using `knitr`, see the [chunk options](http://yihui.name/knitr/options/) especially the `dip` setting. – daroczig Sep 07 '15 at 23:14

2 Answers2

13

Following the suggestion of @daroczig to use the "dpi" knitr chunk option, I modified your code as follows (see below).

  • You had set the dev chunk option equal to "svg", which produces very large vector graphics files, especially for images made up of many elements (points, lines, etc.)
  • I set the dev chunk option back equal to "png", which is the default raster graphics format for HTML output. So you don't need to touch it at all. Keeping the dev chunk option equal to "png" dramatically reduces the HTML output file size.
  • I set the dpi chunk option equal to 36 (72 is the default), to lower the image resolution, and decrease the HTML output file size further.
  • I set the out.width and out.height chunk options equal to "600px", to increase the image dimensions.
  • You can change the dpi, out.width, and out.height options, until you get the HTML output file size and the image dimension to what you want. There's a trade-off between output file size and image resolution.

After knitting the code, I got an HTML output file size equal to 653kB, even when plotting 5e4 data points.

---
title: "Change size of output HTML file by reducing resolution of plot image"
author: "My Name"
date: "September 7, 2015"
output: html_document
---

```{r}
# load ggplot2 silently
suppressWarnings(library(ggplot2))
# chunk option dev="svg" produces very large vector graphics files
knitr::opts_chunk$set(dev="svg")
# chunk option dev="png" is the default raster graphics format for HTML output
knitr::opts_chunk$set(dev="png")
```

```{r, dpi=36, out.width="600px", out.height="600px"}
# chunk option dpi=72 is the default resolution
set.seed(1)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```
algoquant
  • 1,087
  • 1
  • 11
  • 15
  • 1
    It does reduce the size and resolution of the embedded images, but nonetheless, the HTML file remains huge because of long Javascripts added in the head of the file. Are these really necessary? – Denis Cousineau May 11 '22 at 11:56
  • @DenisCousineau When I knit the R code on my Mac I get an html file of 824kb. I inspected the html file but I didn't see much Javascript in there. – algoquant May 11 '22 at 23:26
1

To prevent scatterplots with many points blowing up the size of your vector graphics (and accordingly html output) you can use geom_point_raster() from the ggrastr package. Eat the cake and have it too!

jan-glx
  • 7,611
  • 2
  • 43
  • 63