Why does DITA Open Toolkit PDF plugin rename image href attributes?

Question

I'm sorry if this doesn't have enough information. I don't typically ask for help online like this.

I'm using DITA Open Toolkit 3.4 on Windows. I generated a plugin called "vcr2" using Jarno's (very excellent and helpful) PDF Plugin Generator and then made a handful of customizations. The plugin uses the pdf2 plugin as a base. When I try to use the vcr2 plugin, my images are not working. I've tracked the problem down to malformed image filenames in the image's href attribute.

For example:

In my source file (a DITA Task), the markup for one of my images looks like this:

<image href="MyRemindersChooseReminder.png"/> 

If I run a transform with the pdf2 plugin, the images work fine. In the merged stage1.xml file in the Temp folder, the XML for that same image looks like this:

<image class="- topic/image " href="df2d132af27436c59c5c8c4282e112d62bec8201.png" placement="inline" xtrc="image:1;10:66" xtrf="file:/V:/Vasont/Extract/t12340879-minimal/t12340879.xml"/>

It is processed into a file Topic.fo, and looks like this:

<fo:external-graphic  src="url('file:/V:/Vasont/Extract/t12340879-minimal/MyRemindersChooseReminder.png')"/>

Everything works fine and the image looks fine.

If I run the same file through my 'vcr2' plugin, which just calls the same pdf2 plugin with some overrides, all the images get broken:

stage1.xml <image class="- topic/image " href="df2d132af27436c59c5c8c4282e112d62bec8201.png" placement="inline" xtrc="image:1;10:66" xtrf="file:/V:/Vasont/Extract/t12340879-minimal/t12340879.xml"/>

Topic.fo <fo:external-graphic  src="url('file:/V:/Vasont/Extract/t12340879-minimal/df2d132af27436c59c5c8c4282e112d62bec8201.png')" />

As I track this down further, it appears that somewhere in the map-reader Ant task, this filename gets changed to that cryptic string of pseudo-hexadecimal. I think later on it's supposed to be changed back or resolved to a complete URI or something.

So, the two-part question is: Why does Open Toolkit change my filenames, and what's supposed to change them back?

score 2 · Answer 1 · answered Mar 26 '20 at 16:34

2

DITA-OT's preprocess uses hashes for temporary filenames because it allows the code to not deal with directory structures. This enables preprocess to work in so-called "map-first" mode, where it first processes all DITA map resources and only then starts to process DITA topic and image resources.

The preprocess has a step called clean-preprocess that can rewrite the temporary file names to match source resource files names. However, this rewrite operation is disabled for PDF output because the original file names are not used for anything in that output type.

answered Mar 26 '20 at 16:34

jelovirt

5,844
8
38
49

Thank you so much for the reply. I'm confused about one thing: You say the original file names are not used for anything in the PDF output type. How is that so? In the 'working' PDF example, the .FO file generated has the original filenames restored. In the 'broken' PDF example, the filenames are still hashed, and of course, files with those names don't exist, so Antenna House can't find them to include them in the PDF. – Andrew Mark Mar 26 '20 at 17:04
To answer my own question in the previous comment: "How is it that the original file names are not used for anything in the PDF output type?" When the hashed filenames are created during preprocess, the hash and the original filename are stored in the ".job" file. Later, in the topic/image template in the PDF2 plugin's "topic.xsl", the hashes are used to look up the filenames in that ".job" file, and output the original filename instead of the hash. (In my case, this step was not working because of a DITA customization and I had to change the logic a bit to fix it.) – Andrew Mark Apr 03 '20 at 23:57

Why does DITA Open Toolkit PDF plugin rename image href attributes?

1 Answers1