1

I convert PDF to image using pdf2image which is python package.

But in result, PDF page information(?), which is not visible on pdf viewer, is appeared.

How can i remove page information on PDF, not on image?

PDF file link is https://1drv.ms/b/s!Ar1AW_VI_HwvkMAOyDmQhFEKrZnRWg?e=fvWEwN

enter image description here

yes89929
  • 319
  • 1
  • 4
  • 11

1 Answers1

0

The PDF Page data/information for viewing has been constrained by a "crop box" or "trim box" which in most cases would be identical to the paper "media box" However when using crop marks for printing or display the crop box area will be smaller than the media box area.

pdf2image has a setting to cover the use of crop boxes use_cropbox=True, (normal default is False) so in your invocation you would need to set that argument/option

use_cropbox

Uses the PDF cropbox instead of the default mediabox. This is a rather dark feature that should be set to true when the module does not seem to work with your data.

However looking into the file the values have been altered from expected so a source page is defined as
<< /CropBox [ 0 0 676 855] /MediaBox [ 0 0 676 856]...
thus there would be no noticeable difference, the 1 unit is only 1/72"
But 48 pages have later additional (LaTeX ?) crop box values of
<</CropBox[32.4 32.4 643.6 823.6]... and this seems to effect the issue of the trimmed viewport.

pdfinfo filename.pdf reports the cropped area Page size: 611.2 x 791.2 pts (letter)

For that reason (there are two conflicting settings) :-
Then without a working pdf2image set-up for testing, I am not 100% confident that the use_cropbox=True setting may always work reliably.

There are other methods that might work better and Ghostscript and other python dependency applications have similar, or alternate, means to clip the image output directly on the file. Using poppler direct we could get the same default output

enter image description here

However if we specify -cropbox the secondary crop, in this case, will be taken into account.

pdftoppm -png -cropbox "process data sheet.pdf" output

enter image description here

If that did not work we would need to define the exact area using

  -x <int>                                 : x-coordinate of the crop area top left corner
  -y <int>                                 : y-coordinate of the crop area top left corner
  -W <int>                                 : width of crop area in pixels (default is 0)
  -H <int>                                 : height of crop area in pixels (default is 0)
K J
  • 8,045
  • 3
  • 14
  • 36