0

I have been trying to find a way to get our OCRed PDF (bad-uc.pdf) to behave the same as the infix saved (good-uc.pdf).

If you open the following two files in Acrobat Reader (any version should show the same problem), you will see the bad-uc.pdf loads the text before the page image (very slowly)... where the good-uc.pdf loads everything together (seems much faster and responsive).

good-uc.pdf: https://drive.google.com/file/d/0B-Nxr9ySWJnNX2dZSmVscEZIRmc/view?usp=sharing bad-uc-pdf: https://drive.google.com/file/d/0B-Nxr9ySWJnNN2t6X2hFNTBxa0U/view?usp=sharing

I have tried: pdftk, pdftops, ghostscript, pdf2ps, ps2pdf and qpdf, but still couldn't get the images to load before the text... Can someone experts in PDF shed some lights on why these two PDFs behave differently...

My guess is infix restructure the PDF so the images get loaded before the embedded text, but I cannot find a Linux command line tool that can do this kind of PDF structure Optimization.

Greatly appreciated!! Jeffrey

1 Answers1

0

shed some lights on why these two PDFs behave differently...

Actually both your PDFs take about the same time until being properly displayed by Adobe Reader on my computer. But while your bad-uc.pdf first shows the OCR'ed text and then covers it with the scan, the good-uc.pdf first seems to show a plain page and then covers it with the scan.

The cause for this is that good-uc.pdf paints the OCR'ed text in rendering mode 3 ("invisible") while bad-uc.pdf paints it normally in rendering mode 0 ("fill outline") with fill color black. As invisible painting may require less time than actual painting in black on white, there might even be an objective difference between the rendering times but I think it mostly is subjective.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thank you very much mkl for the great information! Do you know if there is any Linux tool can alter the rendering mode? So I can ge tthe bad-uc.pdf to render in mode 3 ("invisible"). – Jeffrey Ke Jul 12 '16 at 01:31
  • I don't know such tools, i merely have an idea how to implement such a tool. – mkl Jul 12 '16 at 10:46
  • Thanks a lot sir!! That was a great hint. I have managed to wrote a simple script that inserts the rendering mode tags into appropriate places. However, that script might only be useful to our application as all of our PDFs are generated by ABBYY and we have control over the PDF's format consistency to allow my script to insert tags in the correct places. P.S. Sorry, I really wanted to press the "useful" arrow, but my reputation is not high enough... Again, thanks a lot for the great hint!! AWESOME!! – Jeffrey Ke Jul 15 '16 at 03:17