How can I extract images from a PDF in linux while preserving transparency?

Question

I've tried using pdfextract to extract images from a PDF and while it does extract the images I want, it extracts them with a black background. However, it also extracts a "mask" image, which I believe is the alpha channel.

I've read through http://www.imagemagick.org/Usage/masking, but I see no example for applying an already-extracted mask to an existing image to restore transparency. Is there a way to do this using imagemagick? If not, is there an easier way to extract images from a pdf while preserving transparency?

Try `convert -background none image.pdf image.png` to convert the pdf to a transparent background png. Add `-colorspace sRGB` before image.pdf, if the input PDF is CMYK. — fmw42, Jan 16 '20 at 18:30

Ben Davis · Accepted Answer · 2020-01-17T19:53:44.063

3

I just found the answer from this post:

convert extracted-image.png extracted-image-mask.png -alpha off -compose copy-opacity -composite bug.png

If anyone's interested, I made a little script to do all the steps at once: https://gist.github.com/bendavis78/ed22a974c2b4534305eabb2522956359

edited Jan 17 '20 at 19:53

answered Jan 16 '20 at 18:23

Ben Davis

13,112
10
50
65

How can I extract images from a PDF in linux while preserving transparency?

1 Answers1