4

I've been looking into a PDF file to understand how it is built.

I noticed that InDesign has created PDFs with text as below (after decompression using pdftk).

0 Tc /Span<</ActualText<FEFF0009>>> BDC 
4.018 -0.2 Td
( )Tj

I understand the role of ActualText (for copy/paste/searching) but I'm wondering exactly how I should be interpreting the FEFF0009. It looks like a UTF-16 string with BOM chars to represent a tab character. This seems incorrect as it's really a space. I'm wondering if there is a special meaning here?

Nick P
  • 759
  • 5
  • 20
  • *a special meaning here* - probably for InDesign but definitively not for the file as a PDF. – mkl Oct 15 '14 at 05:25

1 Answers1

5

.. This seems incorrect as it's really a space.

No, it's really a tab.

14.9.4 Replacement Text
NOTE 1: Just as alternate descriptions can be provided for images and other items that do not translate naturally into text (as described in the preceding sub-clause), replacement text can be specified for content that does translate into text but that is represented in a nonstandard way.
(PDF 32000-1:2008)

The PDF text engine does not support the concept of 'tabs'. In this case, InDesign mimicked the function of a tab character by inserting a space in the text stream, and it could set the space width to match the distance spanned by the original tab or use a large relative positioning for the rest of the text (which it did here: the horizontal displacement of 4.018 in your code snippet).

The general idea is that a space is rendered on the position of the tab, but when you copy this text and paste somewhere else you get a tab character. I suppose the 'space' is only inserted to have something to copy.

Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Thanks that makes a lot of sense now. I just had another look and I'm also seeing some FEFF0007 (BELL?) characters used in a similar way. It looks like maybe someone has copy/pasted formatting characters when building the document. – Nick P Oct 15 '14 at 22:47
  • And strangely it seems that copy/paste from Acrobat does not bring the tab across, it uses the space. – Nick P Oct 17 '14 at 00:33