0

So I have this legacy proprietory database that can spew some well-formed XML.

I'd like to XSLT the shit out of some records to obtain OpenOffice documents nice enough to please a secretary, be signed by the big boss and sent out as official corporate snail mail. Legally binding so better be exact. The resulting document is pretty simple, your average business letter with maybe a table or two, two pages at the very most. Don't ask me why they still use snail mail.

I see that OpenOffice documents are, of course, XML too. (MS Office too maybe, that's an option, but I'll stick to OO for now).

My experience with XSLT does'nt go much farther than basic tutorials, some years back.

I've quicky scouted the web in search of the OOo DTD, and it's more elusive than I thought.

I'd appreciate some pointers to get me started.

1/ Where are those damned OpenOffice DTDs ?

2/ There must be out there some example of XSLT to OOo. Know any ?

3/ What would be the correct though process ? Of course, I could parse the original XML and generate the output on the fly, element by element, but that would be tedious and I'd rather not go that way. My concern here is to find a way to write the adequate XSLT stylesheet. Where should I start ?

To give it all a little substance, please find enclosed a simplified mockup of the original XML.

<document>
    <metadata>Don't care</metadata>
    <body>
        <sendto>
            <person>Mrs Jane Doe</person>
            <street>Pensylvania Av.</street>
            <number>1234</number>
            <zip>QLD-56789</zip>
            <city>Brisbane</city>
        </sendto>
        <placedate>Bumfuck, AZ, march 29th 2017</placedate>
        <subject>
            Our order #
            <ordernumber>G-27b/6</ordernumber>
        </subject>
        <phrases>
            <phrase>blah</phrase>
            <phrase>bleh</phrase>
        </phrases>
        <order>
            <item>
                <reference>42</reference>
                <name>Bath towel</name>
                <unitprice>4.2</unitprice>
                <quantity>20.0</quantity>
                <totalprice>84.0</totalprice>
            </item>
            <item>...</item>
            ...
            <item>...</item>
            <totalprice>1024.0</totalprice>
        </order>
        <deliverto>
            <person>...</person>
            <street etc.></street>
        </deliverto>
        <phrases>
            <phrase>...</phrase>
            <phrase>Thx, ciao</phrase>
        </phrases>
        <signature>
            <person>Zap Branigan</person>
            <title>Director of corporate stuffs</title>
        </signature>
    </body>
</document>
Éric Viala
  • 596
  • 2
  • 12
  • Have you read this: https://en.wikipedia.org/wiki/Office_Open_XML_file_formats ? – michael.hor257k Aug 25 '16 at 05:52
  • I haven't, and I feel ashamed, and I will. Thank you, that's exactly the kind of starters I'm looking for. – Éric Viala Aug 25 '16 at 05:57
  • Well, so you should :-) Note especially the fact that OOXML documents are "*ZIP files containing XML and other data files*" - and as such cannot produced by XSLT alone. But do see also: https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats – michael.hor257k Aug 25 '16 at 06:10
  • Funny. At home on Ubuntu I have the option to save docs as "OpenDocument Text (Flat XML) (.fodt)" which is indeed plain xml. Now at the office on Windows, I don't. Maybe an add-on or something. Anyway, the *other data files* are either plain text or simple XML and shouldn't be too difficult to synthetize. – Éric Viala Aug 25 '16 at 06:59
  • If there ain't no need to edit something after synthezising, you might also look into XSL-FO which will focus on an XML vocabulary suitable for formatting an pdf document Not sure if it is easier that OO in the end though – Stefan Hegny Aug 25 '16 at 19:21
  • Actually, the whole point is to enable editing, so that a human can have the last word on what's actually mailed. Anyway, XSL-FO seems like something worth looking into. It does appear to be geared toward PDF, but there are probably other uses ? Anyway, after the mail is signed and sent, it may be interesting to freeze it in PDF for future reference. – Éric Viala Aug 26 '16 at 06:04
  • @michael.hor257k : thanks again for the links. I went through them, interesting indeed. As I said, going the Microsoft way is indeed an option, but would be only the second best. I get that Office Open Document is an ISO thing, but somehow I don't really trust Microsoft to be transparent and reliable with standards, even those they pushed themselves. – Éric Viala Aug 26 '16 at 06:10

1 Answers1

1

In case anyone is interested, here is what I came up to so far. By the way, the source database is Lotus Domino. To simplify things, I quickly designed a very simple base dealing with "things" that have a shape, a color and a size. Three fields, it's a start.

I used LibreOffice. It has the native option of saving, and reading, documents as flat XML (.fodt), which OpenOffice has not. For design and transformation, good old Notepad++ with the "XML Tools" add-on.

1/ The freshly exported xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="shapeshifter.xsl"?>
<document xmlns="http://www.lotus.com/dxl" version="9.0" maintenanceversion="1.0" replicaid="C125801D0049FEC5" form="Thingy">
    <noteinfo noteid="8f6" unid="25031389A7D0B3E4C125801D004B9E77" sequence="8">
        <created>
            <datetime dst="true">20160828T154557,67+02</datetime>
        </created>
        <modified>
            <datetime dst="true">20160828T160525,82+02</datetime>
        </modified>
        <revised>
            <datetime dst="true">20160828T160525,81+02</datetime>
        </revised>
        <lastaccessed>
            <datetime dst="true">20160828T160525,82+02</datetime>
        </lastaccessed>
        <addedtofile>
            <datetime dst="true">20160828T154610,89+02</datetime>
        </addedtofile>
    </noteinfo>
    <updatedby>
        <name>CN=&#xC9;ric/O=Org</name>
    </updatedby>
    <revisions>
        <datetime dst="true">20160828T154610,88+02</datetime>
        <datetime dst="true">20160828T154724,42+02</datetime>
        <datetime dst="true">20160828T154926,61+02</datetime>
        <datetime dst="true">20160828T155209,03+02</datetime>
        <datetime dst="true">20160828T155257,07+02</datetime>
        <datetime dst="true">20160828T155950,07+02</datetime>
        <datetime dst="true">20160828T160018,99+02</datetime>
    </revisions>
    <item name="$EncryptionStatus">
        <text>0</text>
    </item>
    <item name="$SignatureStatus">
        <text>0</text>
    </item>
    <item name="Shape">
        <text>oval</text>
    </item>
    <item name="Color">
        <text>red</text>
    </item>
    <item name="Size">
        <text>medium</text>
    </item>
</document>

2/ The XSLT

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:d="http://www.lotus.com/dxl"
        xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
        xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
        >
    <xsl:output method="xml" encoding="utf-8" version="1.0" media-type="application/xml" indent="yes"/>
    <xsl:template match="/">
        <xsl:apply-templates select="d:document"/>
    </xsl:template>

    <xsl:template match="d:document">

        <office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
            xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
            office:version="1.2"
            office:mimetype="application/vnd.oasis.opendocument.text">
            <office:body>
                <office:text>
                    <text:sequence-decls>
                        <text:sequence-decl text:display-outline-level="0"
                                text:name="Illustration"/>
                        <text:sequence-decl text:display-outline-level="0"
                                text:name="Table"/>
                        <text:sequence-decl text:display-outline-level="0"
                                text:name="Text"/>
                        <text:sequence-decl text:display-outline-level="0"
                                text:name="Drawing"/>
                    </text:sequence-decls>
                    <text:p text:style-name="P1">
                        <xsl:text>We see here an object whose color is </xsl:text>
                        <xsl:apply-templates select="d:item[@name='Color']"/>
                        <xsl:text> and in the shape of a </xsl:text>
                        <xsl:apply-templates select="d:item[@name='Shape']"/>
                        <xsl:text>. Note the </xsl:text>
                        <xsl:apply-templates select="d:item[@name='Size']"/>
                        <xsl:text> size.</xsl:text>     
                    </text:p>
                </office:text>
            </office:body>
        </office:document>          

    </xsl:template>

    <xsl:template match="*">
        <xsl:text>A thing.&#10;</xsl:text>
    </xsl:template>
    <xsl:template match="d:item[@name='Color']">
        <xsl:value-of select="d:text"/>
    </xsl:template>
    <xsl:template match="d:item[@name='Shape']">
        <xsl:value-of select="d:text"/>
    </xsl:template>
    <xsl:template match="d:item[@name='Size']">
        <xsl:value-of select="d:text"/>
    </xsl:template>
</xsl:stylesheet>

3/ The output

<?xml version="1.0" encoding="utf-8"?>
<office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:d="http://www.lotus.com/dxl" office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.text">
  <office:body>
    <office:text>
      <text:sequence-decls>
        <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Table"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Text"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
      </text:sequence-decls>
      <text:p text:style-name="P1">We see here an object whose color is red and in the shape of a oval. Note the medium size.</text:p>
    </office:text>
  </office:body>
</office:document>

The result can be saved as something.fodt and directly opened with LibreOffice.

The next steps :

  • automate the process so that, from the user's point of view, it all happens at once at the click of a button.

  • more work on the xsl template. A real-life Domino document is much more complex than that, with multiple level of nesting and many more element types.

  • and of course the whole point is to generate a nicely formated document, so that will be more digging in the niceties of the Oasis DTDs ...

But at least now I have a proof of concept and the outline of a method.

Éric Viala
  • 596
  • 2
  • 12