1

I have a small PDF file, which is supposed to display just the string "Hello World!".

Unfortunately, it displays black boxes instead of the characters. I suppose there is some problem with the fonts, but I am not sure.

Is there a way to diagnose and troubleshoot this issue? All I see on the Internet is advices to do this and to do that, which helps to some and does not to others (nothing helped me). Looks like shooting in the dark to me.

Here is a concrete example. Why does this PDF display black squares instead of the string Hello World ?

EDIT

A bit of the context. I am trying to convert a trivial HTML to PDF using the wkhtmltopdf tool. It is an absolute frustration, because according to the Internet searches the tool is supposed to work and do it quite well. But the thing does not work for me and nothing I do changes this! Unfortunately, this tool seems the only free tool to convert HTML to PDF. This is a huge bummer.

Community
  • 1
  • 1
mark
  • 59,016
  • 79
  • 296
  • 580
  • what makes you think that your sample pdf *is supposed to display just the string "Hello World!"?* As David already pointed out in his answer, it essentially contains operators to display multiple boxes. – mkl Apr 07 '13 at 10:23

3 Answers3

3

If you want to find out whether a PDF is valid or what is wrong with it, there are a few general steps you can take:

1) Open it in Adobe Acrobat or Adobe Reader (on a desktop platform, not a tablet device). For a very long time the PDF format was owned by Acrobat and the way their software handles PDF is still close to the gold standard. However, there is a caveat with this; Acrobat is very, very smart in the way it handles PDF files and it will overlook or actively correct a number of mistakes other PDF engines might have a problem with...

2) Get yourself a preflight tool. These tools were invented for use in graphic arts, but have applications outside of it too. Popular examples are callas pdfToolbox (warning, I'm affiliated with this vendor!) or the "Preflight" plug-in you'll find in Adobe Acrobat Pro (which is actually also callas technology under the hood). Then preflight specifically against the PDF/A-1b or PDF/A-2b standard.

That last point deserves some more explanation. You should pick a PDF/A compliant preflight profile because the PDF/A (or PDF for Archival) standard is extremely picky. It's goal is to make sure that PDF files will still be readable in exactly the same way 50 years from now and to ensure that it tests a whole range of properties of the file itself and the different components in it. You might be able to ignore some of the errors you get (because some of them will be connected to the fact that the PDF/A identification isn't correct for example) but I wouldn't ignore any other errors unless you understand exactly what they mean and why they aren't relevant.

PS: Can you make your test file available some other way? The file you shared in your question is useless I think. When I do "Download" I get a PDF file that doesn't contain text and doesn't have fonts in it. Those rectangles you see are exactly that - rectangles. So this PDF renders fine - it's the PDF generation process (or the fact that you stored the file on Google docs - I really have no clue what that might do) that went berserk apparently.

David van Driessche
  • 6,602
  • 2
  • 28
  • 41
3

In addition to David's hints (first using a known good viewer and then some preflight tool), there is a third level in the inspection process:

3) Inspect the PDF with your own eyes and with the PDF specification (made available by Adobe here) at hand in a text viewer (for a first impression) and (if the cause of the issue at hand is not immediately visible) then in a PDF browsing tool (for in-depth analysis).

This step is quite cumbersome at first but after some time you learn your way around in the PDFs.

A sample for such a PDF browser tool is RUPS but there are others around, too.

mkl
  • 90,588
  • 15
  • 125
  • 265
2

'Small PDF file supposed to display "Hello World!"'

Not correct. The file you linked to does not contain any code that could render pixels on screen or on paper that a human brain would read as "Hello World!". The file indeed does only contain vector drawing operations which result in 12 black boxes.

The command line tool pdffonts does not indicate any font being used in the file:

pdffonts so-file-#15858199.pdf

What could still cause the "rendering" of the words you are looking for: some vector or pixel drawing code contained in the PDF. To find out about this, you'll have to look into the low level source code of the PDF.

The original file is 1.570 Bytes. So this task looks not as being overly huge.

'Is there a way to diagnose and troubleshoot this issue?'

Using qpdf, a "command-line program that does structural, content-preserving transformations on PDF files", you can expand all contained streams (which are normally compressed):

qpdf --qdf --object-streams=disable so-file-#15858199.pdf qdf-#15858199.pdf

The resulting file, qdf-#15858199.pdf, is 3.875 Bytes. Now open it in a text editor. PDF object no. 6 (lines 66-219) contains the contents of the page. Lines 123-194 contain only the operators m (moveto), l (lineto) and h (closepath). These lines contain 12 different groups of drawing commands, where each one represents the path for one of the 12 black boxes you see rendered on screen or printed on paper:

102.400001 12.8000001 m
268.800004 12.8000001 l
268.800004 179.200002 l
102.400001 179.200002 l
102.400001 12.8000001 l
h

Line 196 contains

f

which is the fill operator to actually fill black color into so far constructed (closed) path. Nothing in the other lines (which I didn't analyze in detail) does any drawing that may resemble the shapes of any glyphs.

'Unfortunately, this tool seems the only free tool to convert HTML to PDF'

Not correct either.

1.

Assuming your "free" is meant as free as in liberty, then an alternative option is HTMLDOC.

HTMLDOC does not support specific fonts which may be assigned to your HTML input via CSS, but it does a good job in converting one or multiple HTML documents into a single PDF book containing chapters, page-numbering, page headers and footers and more. For all options available, see its full documentation.

2.

Assuming your "free" is meant as free as in beer, then an alternative option (for private usage only) could be PrinceXML.

PrinceXML does an extraordinarily good job when it comes to support almost all CSS features your HTML document may be using. See its documentation and also some of the sample PDF files produced by PrinceXML.

Aleksandr Kovalev
  • 3,508
  • 4
  • 34
  • 36
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345