I know how to use GraphicsMagick to make a thumbnail of the first page of a pdf if I have a pdf file and am running gm
locally. I can just do this:
gm(pdfFileName + "[0]")
.background("white")
.flatten()
.resize(200, 200)
.write("output.jpg", (err, res) => {
if (err) console.log(err);
});
If I have a file called doc.pdf
then passing doc.pdf[0]
to gm
works beautifully.
But my problem is I am generating thumbnails on an AWS Lambda function, and the Lambda takes as input data streamed from a source S3 bucket. The relevant slice of my lambda looks like this:
// Download the image from S3, transform, and upload to a different S3 bucket.
async.waterfall([
function download(next) {
s3.getObject({
Bucket: sourceBucket,
Key: sourceKey
},
next);
},
function transform(response, next) {
gm(response.Body).size(function(err, size) { // <--- gm USED HERE
.
.
.
Everything works, but for multipage pdfs, gm
is generating a thumbnail from the last page of the pdf. How do I get the [0]
in there? I did not see a page selector in the gm documentation as all their examples used filenames, not streams I believe there should be an API, but I have not found one.
(Note: the [0]
is really important not only because the last page of multipage PDFs are sometimes blank, but I noticed when running gm
on the command line with large pdfs, the [0]
returns very quickly while without the [0]
the whole pdf is scanned. On AWS Lambda, it's important to finish quickly to save on resources and avoid timeouts!)