0

I am trying to convert the first page of PDF (containing both image and text) to image using Wand (Imagemagick binding for python). The output looks like this!!!https://i.stack.imgur.com/OdCMZ.jpg The text here is not the part of the image.The Image is not spanning completely. It is only showed on one half.

If the PDF doesn't have any text it spans completely like this..https://i.stack.imgur.com/NmcjO.jpg The text in this is the part of the image

I don't understand if the problem is with text or the library. How can the first Image also span completely.

1 Answers1

0

when pdf convert, imagemagick using Ghostscript. if you want result only have inner borderbox content. you can use '-dUseCropBox' option.

import subprocess
cmd = [
    'gs',
    '-q',
    '-dQUIET',
    '-dSAFER',
    '-dBATCH',
    '-dNOPAUSE',
    '-dNOPROMPT',
    '-dMaxBitmap=500000000',
    '-dAlignToPixels=0',
    '-dGridFitTT=2',
    '-dUseCropBox',
    '-dTextAlphaBits=4',
    '-dGraphicsAlphaBits=4',
    '-r{0}x{0}'.format(200),
    '-sDEVICE=jpeg',
    '-dJPEGQ=100',
    '-sOutputFile=%05d.jpg',
    'test.pdf'
]
subprocess.call(cmd)
c2o93y50
  • 211
  • 2
  • 4