0

I need to determine document page information from a postscript or a pcl file. Preferably in Java, but Ghostscript/Ghostpcl is as good as well.

What I tried to get the following info:

Page color

This can be achieved with ghostscript/ghostpcl using the device called inkcov.

PostScript
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=inkcov -o- input.ps

PCL6
gpcl6win64 -dNOPAUSE -dBATCH -sDEVICE=inkcov -o- input.pcl

Page size

There is a device called bbox which gives me the boundary box per page for PostScript or PCL6 documents

PostScript
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=bbox -o- input.ps

PCL6
gpcl6win64 -dNOPAUSE -dBATCH -sDEVICE=bbox -o- input.pcl

But in the end the boundary box is an inaccurate approximation for the page size. I checked the following post, but the solution seems not to work with my ghostscript version 9.5 Getting the page sizes of a PostScript document

CAA
  • 23
  • 4

2 Answers2

1

The bbox device should provide accurate information, in what way is it inaccurate ? I'd test it myself but you haven't supplied a file to demonstrate this.

You need to bear in mind that its possible some objects (eg images) might mark the page with white space. That still counts as marking the page for the purposes of the bbox device. If you want to only count non-white output samples, then you need to render the document (at the final resolution you intend to use) and actually count the non-white pixels. That's a potentially very slow operation because it needs to read every output colour sample of what could be a very large image.

Its not hard to code though, and you could use the inkcov device as a basis for doing both operations in the same pass.

Or you could just have GhostPDL deliver the rendered bitmap for you and code a solution to the bounding box using some other tool/language.

Ah, are you actually looking for the requested media size, rather than the Bounding Box ? That's not the same thing at all. The bounding box returns the smallest rectangle which encloses all the marks on the output, it doesn't tell you how big the requested media was. So a small rectangle in the bottom left would give you a tiny BBox, even if hte media was large.

You can reasonably easily get the media size requests from PostScript by writing a small PostScript program, but you can't do that with PCL. Perhaps the easiest solution in both cases is to render the content to a file at 72 dpi, then read the width/heiight of the rendered output and that gives you the media size in points.

Or use the pdfwrite device to convert the input into PDF and then the pdf_info.ps PostScript program can be used to give you the sizes of the pages from the PDF file.

KenS
  • 30,202
  • 3
  • 34
  • 51
0

Indeed I am looking for the requested media size, rather than the Bounding Box. Maybe I should have been more specific. Here is some ascii art to brighten up your day.

y
^
|
|
+-----------+
| +----+    |
| |bbox|    |
| +----+    |
|           |
|           |
|           |
|           |
|           |
+-----------+----> x

A simple document with some text in the upper left corner.

KenS: "The bounding box returns the smallest rectangle which encloses all the marks on the output, it doesn't tell you how big the requested media was."

So for the time being the "easiest" solution was really to transform the ps/pcl file into a pdf and read the media size from there.

Conversion to PDF

PostScript
gswin64c.exe -dBATCH -dNOPAUSE -dNOOUTERSAVE -sDEVICE=pdfwrite -sOutputFile=output.pdf input.ps

PCL6
gpcl6win64 -dBATCH -dNOPAUSE -dNOOUTERSAVE -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pcl
CAA
  • 23
  • 4