0

I switched from the old iText library to the iTextPdf library and noticed a problem. The new library sets the producer to a value that includes non-Unicode characters (windows TM symbol and copyright symbol). The problem is that validation programs that read this text choke on these characters.

Can I get iText to fix this (w/o paying for a license)? I am ok with iText getting credit. I just want the credits to be Unicode clean.

<</Producer(iText® 5.5.0 ©2000-2013 iText Group NV \(AGPL-version\))/ModDate(D:20150126155550-07'00')/CreationDate(D:20150126155550-07'00')>>
Mark.ewd
  • 694
  • 1
  • 8
  • 22
  • **Can I get iText to fix this (w/o paying for a license)?** is a strange question. Are you using iText in an AGPL application? If so, please share the URL where we can see your code. Are you using iText in another context, then you probably need a commercial license anyway. (There are exceptions, though.) Also, it is strange that you would switch to 5.5.0 and not to 5.5.4. – Bruno Lowagie Jan 30 '15 at 16:03
  • I asked the question this way because the site states commercial licensees are released from the requirement to not change the producer. I am using it AGPL and have contributed code to the development. – Mark.ewd Jan 30 '15 at 16:45
  • OK, in that case, there's no problem (although it would help if you added a link to your AGPL project to your profile). Anyway, as @mkl explained: there's really nothing to fix in iText. – Bruno Lowagie Jan 30 '15 at 17:01

1 Answers1

2

You are looking at the document information dictionary of a PDF, more exactly at the value of its Producer entry. It is specified as:

Producer text string (Optional) If the document was converted to PDF from another format, the name of the conforming product that converted it to PDF.

(Table 317 – Entries in the document information dictionary)

So the value must have the type text string. This in turn is specified as:

The text string type shall be used for character strings that shall be encoded in either PDFDocEncoding or the UTF-16BE Unicode character encoding scheme. PDFDocEncoding can encode all of the ISO Latin 1 character set and is documented in Annex D.

(section 7.9.2.2 Text String Type)

In Annex D you find:

               CHAR CODE (OCTAL)
CHAR NAME       STD MAC WIN PDF
...
©    copyright   —  251 251 251
...
®    registered  —  250 256 256
...

(D.2 Latin Character Set and Encodings)

Thus, these characters are completely valid here and validators which choke on these characters are broken.

So you had better report this bug to the developers of the validators in question.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265