1

pdftk let's you set the title of a PDF with the following command:

pdftk input.pdf update_info metadata.txt output output.pdf

However, if I use special characters in the metadata.txt file (such as German characters or chinese characters) then it doesn't seem to work.

Here's an example of changing the title:

InfoBegin
InfoKey: Title
InfoValue: Fingerspitzengefühl is a German term.

However, the PDF ends up with a strange character for the ü

In the documentation of pdftk it says that non-ASCII characters should be encoded as XML numerical entities. However, I Googled myself silly but couldn't find anything that works.

user1914292
  • 1,586
  • 13
  • 38

2 Answers2

3

The best reference I've found is Numerical Character Reference, which is applicable to XML (and XHTML and SGML).

This is generally used to represent characters that are not directly encodable.

In your case, the character is U+252, ü which can be substituted with ü (Decimal), &0374; (Octal), or ü (Hexidecimal).

Using a decimal reference, your file should be encoded as:

InfoBegin
InfoKey: Title
InfoValue: Fingerspitzengefühl is a German term.

Note:

If you're on 'Nix, you can use recode to encode the file.

% cat metadata.txt | recode ..xml
dwarring
  • 4,794
  • 1
  • 26
  • 38
  • that works for German indeed! I haven't checked Chinese or other characters yet, but it's blazing fast and the details make sense! – user1914292 Jun 08 '18 at 20:37
1

This answer seems better as there is no need to install extra tools. Instead, it uses PDFtk’s built-in flag dump_data_utf8 and update_info_utf8:

pdftk input.pdf update_info_utf8 metadata.txt output output.pdf

It works perfect for Chinese.

TomBen
  • 302
  • 3
  • 6