2

I have just started trying to use pdftools to extract images from pdfs. However I have found that not all layers are reproduced. For example in the code below the lines are reproduced in the png but not the points. Obviously in this example I could just save the png directly but I'm just using it to highlight the problem I am having for other data when I don't have the source code/data creating the pdf.

Warnings the code below creates files in the C:\temp directory

library(tidyverse)
library(pdftools)

set.seed(5)
df <- data.frame(Date = rep(as.Date(1:50, origin = "1990-01-01"),2), value = c(1:50,1:50)+c(rnorm(50),rnorm(50,sd=5)), var = rep(c("a","b"),each = 50))


plt1 <- ggplot(df, aes(x = Date, y = value, colour = var))+
  geom_line()+
  geom_point()

ggsave(plt1, filename = "C:/temp/testplot.pdf", width = 5, height = 4)

This creates pdf with points and lines as expected

enter image description here

However when I convert I do no get points, only lines

pdf_convert("C:/temp/testplot.pdf", format = "png", filenames = "C:/temp/testpng.png")
#> Converting page 1 to C:/temp/testpng.png...
#> PDF error: No display font for 'ArialUnicode'
#>  done!
#> [1] "C:/temp/testpng.png"

enter image description here

Created on 2019-11-19 by the reprex package (v0.3.0)

I have also tried using pdftools::pdf_render_page and the image_read_pdf and image_convert from the magick package with the same results. However I understand that the magick functions are actually using pdftools, so the problem must be there

Sarah
  • 3,022
  • 1
  • 19
  • 40

1 Answers1

0

Suggested work-around:

Open pdf file in Adobe Acrobat. Select "File" -> "Print" -> "Microsoft Print to PDF" -> "Advanced" -> check in front of "Print As Image" -> "OK" -> "Print"

Then, perform the "pdf_convert" on the new .pdf copy you just created.

E_SD
  • 1
  • 1