Thumbnail the first page of a pdf from a stream in GraphicsMagick

Question

I know how to use GraphicsMagick to make a thumbnail of the first page of a pdf if I have a pdf file and am running gm locally. I can just do this:

gm(pdfFileName + "[0]")
  .background("white")
  .flatten()
  .resize(200, 200)
  .write("output.jpg", (err, res) => {
    if (err) console.log(err);
  });

If I have a file called doc.pdf then passing doc.pdf[0] to gm works beautifully.

But my problem is I am generating thumbnails on an AWS Lambda function, and the Lambda takes as input data streamed from a source S3 bucket. The relevant slice of my lambda looks like this:

// Download the image from S3, transform, and upload to a different S3 bucket.
async.waterfall([
  function download(next) {
    s3.getObject({
      Bucket: sourceBucket,
      Key: sourceKey
    },
    next);
  },

  function transform(response, next) {
    gm(response.Body).size(function(err, size) {       // <--- gm USED HERE
    .
    .
    .

Everything works, but for multipage pdfs, gm is generating a thumbnail from the last page of the pdf. How do I get the [0] in there? I did not see a page selector in the gm documentation as all their examples used filenames, not streams I believe there should be an API, but I have not found one.

(Note: the [0] is really important not only because the last page of multipage PDFs are sometimes blank, but I noticed when running gm on the command line with large pdfs, the [0] returns very quickly while without the [0] the whole pdf is scanned. On AWS Lambda, it's important to finish quickly to save on resources and avoid timeouts!)

one possible solution that includes some redundancy would be store s3 object locally within `/tmp/` folder (every lambda has access to `/tmp` directory), and then use `gm(pdfFileName + "[0]")`. in other words download file from s3 into lambda temporary folder, and run the `gm` same way you would run it locally. — toske, Jun 26 '18 at 02:31
Thanks, I can give this a try, but I'm surprised there's no function or function parameter. Perhaps the bounty will attract some attention. :) — Ray Toal, Jul 05 '18 at 20:56

score 5 · Accepted Answer · answered Jul 11 '18 at 02:17

You can use .selectFrame() method, which is equivalent to specifying [0] directly in file name.

In your code:

function transform(response, next) {
    gm(response.Body)
        .selectFrame(0)       // <--- select the first page
        .size(function(err, size) {
        .
        .
        .

Don't get confused about the name of function. It work not only with frames for GIFs, but also works just fine with pages for PDFs.

Checkout this function source on GitHub.

Credits to @BenFortune for his answer to similar question about GIFs first frame. I've took it as inspiration and tested this solution with PDFs, it actually works.

Hope it helps.

Perfect, I was hoping it would be that easy. I must have browsed all the source files in `lib` _except_ for args! (I did grep for `[0]` but wouldn't you know of _course_ it would have been a variable.) Makes perfect sense. — Ray Toal, Jul 11 '18 at 07:02

Thumbnail the first page of a pdf from a stream in GraphicsMagick

1 Answers1

Linked