0

I have the following code which takes a PDF file and composes it into a single jpg image which has a horizontal black line between each PDF page image, stacking the PDF pages.

image = MiniMagick::Image.open(pdf_file)

# create a new blank file which we will use to build a composite image
# containing all of our pages
MiniMagick::Tool::Convert.new do |i|
  i.size "#{image.width}x#{image.layers.size * image.height}"
  i.stroke "black"

  image.layers.count.times.each do |ilc|
    next if ilc.zero?

    top = ilc * (image.height + 1)
    i.draw "line 0,#{top}, #{image.width},#{top}"
  end

  i.xc "white"
  i << image_file_name
end

composite_image = MiniMagick::Image.open(image_file_name)

# For each pdf page, add it to our composite image. We add one so that we
# don't put the image over the 1px black line that was added to separate
# pages.
image.layers.count.times do |i|
  composite_image = composite_image.composite(image.layers[i]) do |c|
    c.compose "Over" # OverCompositeOp
    c.geometry "+0+#{i * (image.height + 1)}"
  end
end

composite_image.format(format)
composite_image.quality(85)
composite_image.write(image_file_name)

It works perfectly, except a 20 page PDF file takes three minutes. I'm looking for a better way to do this. I suspect one of these two options will work:

  1. Compose all of the PDF page images at once, although I haven't figured out how to do that.
  2. Use vips, thanks to its pipeline implementation.

I would rather stay with imagemagick, but I am open to either way. I'm looking for pointers how to achieve what I am looking for.

Brandon
  • 1,735
  • 2
  • 22
  • 37

2 Answers2

3

I had a stab at a ruby-vips version:

require 'vips'

# n: is the number of pages to load, -1 means all pages in tall, thin image
image = Vips::Image.pdfload ARGV[0], n: -1

# we can get the number of pages and the height of each page from the metadata
n_pages = image.get 'pdf-n_pages'
page_height = image.get 'page-height'

# loop down the image cutting it into an array of separate pages
pages = (0 ... n_pages).map do |page_number|
  image.crop(0, page_number * page_height, image.width, page_height)
end 

# make a 50-pixel-high black strip to separate each page
strip = Vips::Image.black image.width, 50

# and join the pages again
image = pages.inject do |acc, page|
  acc.join(strip, 'vertical').join(page, 'vertical')
end 

image.write_to_file ARGV[1]

On this desktop with this 58 page PDF I see:

$ /usr/bin/time -f %M:%e ruby ./pages.rb nipguide.pdf x.jpg
152984:1.08
$ vipsheader x.jpg
x.jpg: 595x50737 uchar, 3 bands, srgb, jpegload

So it makes a 50,000 pixel high jpg in about 1.1 seconds and needs a peak of 150 mb of memory.

I tried fmw42's clever imagemagick line:

$ /usr/bin/time -f %M:%e convert nipguide.pdf -background black -gravity south -splice 0x50 -append x.jpg
492244:5.16

so 500 mb of memory and 5.2s. It makes an image almost exactly the same size.

The speed difference is mostly the PDF rendering library, of course: IM shells out to ghostscript, whereas ruby-vips calls poppler or PDFium directly. libvips is able to stream this program, so during evaluation it never has more than one page in memory at once.

JPG has a limit of 65535 pixels in any axis, so you won't be able to get much larger than this. For shorter documents, you could add dpi: 300 to the PDF load to get more detail. The default is 72 dpi.

You should get nice text quality without having to render at high resolution. For example, for the PDF linked above, if I run:

$ vips pdfload nipguide.pdf x.png --page 12

To render page 12 at the default 72 dpi, I get:

enter image description here

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • This is working really nicely. I would like to resize my final image but I cannot figure out how. I can't figure out how to use `#reduceh`, which is what I think I should be using It is saying `Vips::Error: reduceh: parameter hshrink not set` for my call `image = image.reduceh(72 / dpi) # return image to 72dpi`. Do you have any idea what I am doing wrong? – Brandon Oct 05 '18 at 19:52
  • You can just add `image = image.resize 0.5` (or whatever resize factor you want) just before the final `write_to_file`. Though it would be quicker to change the DPI you load at. You should get nice text quality without oversampling. – jcupitt Oct 06 '18 at 03:39
  • 'reduceh` takes the shrink factor, so 2 to shrink by a factor of two. But don't use it! It's part of `resize`, it's not really useful on its own. – jcupitt Oct 06 '18 at 03:43
  • ... I added a page of sample output. – jcupitt Oct 06 '18 at 03:55
  • Oh, thanks. I actually am trying to read it in at a high DPI and then downsample the final image after the compilation of the pages. Doing that helped the text quality of the imagemagick version, so I am guessing it would do the same for vips. Thanks again! – Brandon Oct 08 '18 at 17:32
  • You shouldn't need to -- just render at the dpi you want and it ought to look nice. libvips uses poppler by default for PDF rendering and it does high quality anti-aliasing. Try clicking on that sample page I uploaded. – jcupitt Oct 08 '18 at 18:55
2

I am not sure this is what you want, but it seems to me from your description, you want to append the images.

I created a 3-page PDF from 3 jpg images just for testing. I then add black border (in this case 10 pixels to show it better) at the bottom of each page and then append all the pages.

This was done with Imagemagick 6.9.10.12 Q16, but I suspect Python Wand or minimagick has similar functionality.

convert test.pdf -background black -gravity south -splice 0x10 -append test.jpg


enter image description here

If necessary, you could chop off the black line at the bottom of the last page after the append using -chop 0x10.

fmw42
  • 46,825
  • 10
  • 62
  • 80
  • This looks very promising actually. I am actually going PDF -> JPG, which is the opposite of what you did, but I think this will still work. I'll be sure to accept if it does work. Thanks! – Brandon Oct 04 '18 at 20:26
  • I think you may have misunderstood. I created a PDF from 3 image just to use as input to the process, since I did not have an PDF handy. Then I appended the PDF pages on top of each other and saved to JPG. So I was going PDF to JPG. The dimensions of each page and the final JPG can be changed by adding -density XXX before reading the PDF. `convert -density 150 test.pdf -background black -gravity south -splice 0x10 -append test.jpg`. The default density is 72 if left off. – fmw42 Oct 04 '18 at 23:07
  • Ah, that makes more sense. I actually wondered if that was what you meant but apparently decided incorrectly. – Brandon Oct 05 '18 at 00:15
  • This worked great as it cut the time taken by 95%. Unfortunately, the text quality from the PDF is _terrible_. I have tried changing the image size and density and neither has helped. If you have any other tips, they would be much appreciated! – Brandon Oct 05 '18 at 01:14
  • You can improve the quality by using a large density before reading the pdf and then resizing it back down after reading the pdf. So `convert -density 288 image.pdf -resize 25% ....`. Default density is 72. So 4*72=288. So to get back to normal size use -resize 25% which is 1/4. Try that. If you want larger images, then resize less such as 50%. Or if you need higher quality then use -density 8*72 and resize by 1/8. This will slow processing but give you higher quality. Trade off as you desire. – fmw42 Oct 05 '18 at 01:51
  • This works really well, except when I add the resize, the white background behind my text becomes black, even when I remove `-background black`. – Brandon Oct 05 '18 at 15:10