Each cluster of data frame on separate page of pdf file

Question

I will use mtcars data frame as an example. I would like to print two columns of the data frame on each page of pdf file depending on the cluster which they belong.

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3

Let's assume that column carb is the cluster column. As you see we have 4 different clusters so what I would like to do is to put the output of two columns (starting from the first one) on separate page in pdf.

First page should look like:

Datsun 710 22.8
Hornet 4 Drive 21.4
Valiant 18.1

Second page should look like:

Hornet Sportabout 18.7
Merc 240D 24.4
Merc 230 22.8

Third page should look like:

Merc 450SE 16.4

And I believe that you already know how the output of the last page should look like. Can you help me with that ? My data frame contains in total around 250 clusters.

I think you can split the dataset with the subset columns using `lst <- split(mtcars[1], mtcars$carb)` and you can save the list output to a pdf — akrun, Aug 03 '15 at 07:24
Also check [here](http://stackoverflow.com/questions/15937131/print-to-pdf-file-using-grid-table-in-r-too-many-rows-to-fit-on-one-page) — akrun, Aug 03 '15 at 07:30
Sounds fair. Can you write an answer and show me how to save the output of list in pdf ? I would like to reward you for the solution. It looks fine. — Shaxi Liver, Aug 03 '15 at 07:36
I tried the `pdf('somefile.pdf');lapply(lst, grid.table)` but it is overwriting in a single page. I haven't tried the solution in the link though. If you can test it and get it working, you can post that as a solution. — akrun, Aug 03 '15 at 07:39

score 4 · Accepted Answer · answered Aug 03 '15 at 08:06

4

grid.newpage() forces each table to appear on a new page,

pdf("multipage.pdf")
lapply(split(mtcars, mtcars$carb), function(d) {
  grid::grid.newpage()
  gridExtra::grid.table(d)
  }
  )
dev.off()

answered Aug 03 '15 at 08:06

baptiste

75,767
19
198
294

That's exactly what I was looking for. It puts a whole table as an output but I can live with that. Do you know how I can change the font size in pdf file or how to make a table smaller because I don't see all of the rows. – Shaxi Liver Aug 03 '15 at 08:31
[it's now officially a FAQ](https://github.com/baptiste/gridextra/wiki#problems-with-gridtable) – baptiste Aug 03 '15 at 08:33

Christoph · Answer 2 · 2015-08-03T10:10:28.650

In addition to the @baptiste's answer here is a solution using rmarkdown. In order to print each table on a new pdf page you have to set results='asis'. With cat("\n\n\\newpage") and cat("\n\n\##") you can then dynamically create new pages and headers. knitr::kable() provides a nice table output.

---
title: "mtcars_clusters"
author: "Your name"
date: "3 August 2015"
output: pdf_document
---

```{r, echo=FALSE, results='asis'}
DAT <- lapply(sort(unique(mtcars$carb)), function(cluster) {
      data <- subset(mtcars, carb == cluster)
      cat(paste("\n\n## Cluster", cluster))
      print(knitr::kable(data))
      cat("\n\n\\newpage")
})
```

EDIT: Example with a larger data set

With respect to your comment on @baptiste's answer this markdown approach also handles larger data sets better. Here is an example with the ChickWeight dataset using ChickWeight$Diet as cluster variable:

---
title: "ChickWeight_clusters"
author: "Your name"
date: "3 August 2015"
output: pdf_document
---

```{r, echo=FALSE, results='asis'}
DAT <- lapply(sort(unique(ChickWeight$Diet)), function(cluster) {
  data <- subset(ChickWeight, Diet == cluster)
  cat(paste("\n\n## Cluster", cluster))
  print(knitr::kable(data))
  cat("\n\n\\newpage")
})
```

The output table is automatically split between pages so you should see all rows. Also, if you only want to print specific columns just subset data within print(knitr::kable(data)) accordingly.

Each cluster of data frame on separate page of pdf file

2 Answers2