Is there a programmatic way to transform a sequence of image files into a PDF?

Question

I have a sequence of JPG images. Each of the scans is already cropped to the exact size of one page. They are sequential pages of a valuable and out of print book. The publishing application requires that these pages be submitted as a single PDF file.

I could take each of these images and just past them into a word-processor (e.g. OpenOffice) - unfortunately the problem here is that it's a very big book and I've got quite a few of these books to get through. It would obviously be time-consuming. This is volunteer work!

My second idea was to use LaTeX (actually pdflatex) - I could make a very simple document that consists of nothing more than a series of in-line image includes. I'm sure that this approach could be made to work, it's just a little on the complex side for something which seems like a very simple job.

It occurred to me that there must be a simpler way - so any suggestions?

I'm on Ubuntu 9.10, my primary programming language is Python, but if the solution is super-simple I'd happily adopt any technology that works.

UPDATE, can somebody explain what's going wrong here?

sal@bobnit:/media/NIKON D200/DCIM/100HPAIO/bat$ convert '*.jpg' bat.pdf
convert: unable to open image `*.jpg': No such file or directory @ blob.c/OpenBlob/2439.
convert: missing an image filename `bat.pdf' @ convert.c/ConvertImageCommand/2775.

Is there a way in the convert command syntax to specify that bat.pdf is the output?

Thanks

I think it's getting confused by the shell globbing. Try using double quotes or omitting the quotes altogether. — John Feminella, Apr 12 '10 at 00:18
Why your are quoting '*.jpg', for me convert *.jpg bat.pdf, works. — Anurag Uniyal, Apr 12 '10 at 04:15
Also, it looks like you are outputting the result on your memory card (/media/NIKON D200), which is probably not where you want it. You might run out of space, and in any case you are slightly wearing out the card every time you write to it. — Jouni K. Seppänen, Apr 12 '10 at 06:26
Nope, there's no shortage of space. I'm going to try again tonight. — Salim Fadhley, Apr 12 '10 at 11:40

John Feminella · Answer 1 · 2010-04-12T00:17:37.730

12

It occurred to me that there must be a simpler way - so any suggestions?

You're right, there is! Try this:

sudo apt-get install imagemagick
cd ~/rare-book-images
convert "*.jpg" rare-book.pdf

Note: depending on what shell you're using "*.jpg" might not work as expected. Try omitting the quotes and seeing if that gets you the results you expect.

edited Apr 12 '10 at 00:17

answered Apr 11 '10 at 23:15

John Feminella

303,634
46
339
357

I would recommend trying it on a subset of the files first, just to make sure things look good for the first few pages. If you have a lot of pages, this will be an expensive operation. – John Feminella Apr 11 '10 at 23:16
you may want to use quotes (`'*.jpg'`) since imagemagick is smarter about getting things in the right order than the shell. – cobbal Apr 11 '10 at 23:17
1

@cobbal: That's not a bad idea, thanks. – John Feminella Apr 11 '10 at 23:20
That sounds like a great solution! I'm going to try it out now. Sal – Salim Fadhley Apr 11 '10 at 23:25
That is really freakin' simple. :) – jathanism Apr 11 '10 at 23:49
It does not seem to work as expected, see the update above. – Salim Fadhley Apr 11 '10 at 23:51
@Salim: Hmm, that's odd. What happens if you omit the quotes? – John Feminella Apr 12 '10 at 00:09
+1. But as well as I remember, convert may consume a lot of memory, if there are many pages. Probably a better solution in this case would be to convert each image separately (with convert or sam2p), and concatenate them together with pdftk. – sastanin Apr 12 '10 at 14:04

ars · Accepted Answer · 2010-04-12T06:18:12.543

If you're interested in a Python solution, you can use the ReportLab library. For example:

from reportlab.platypus import SimpleDocTemplate, Image
from reportlab.lib.pagesizes import letter
from glob import glob

doc = SimpleDocTemplate('image-collection.pdf', pagesize=letter)
parts = [Image(filename) for filename in glob('*.jpg')]
doc.build(parts)

This will take all the jpg files in your current directory and produce a file called "image-collection.pdf".

score 0 · Answer 3 · answered Apr 13 '10 at 17:29

I wonder if you could just do it with a for loop with a \includegraphics command inside and some suitably nifty standard image file naming and so on inside a LaTeX file. This might have the advantage of allowing title pages etc and page numbering and so on. (I'm not sure either of the other solutions do this and I can't be bothered to check. I'm just pondering out loud here, really)

Is there a programmatic way to transform a sequence of image files into a PDF?

3 Answers3