1

I am trying to automatically flatten any XML file using XSLT. Is it achievable? I am guessing it is, but I cannot find a way to do it.

Example input

<person>
    <name>
        <first>John</first>
        <last>Doe</last>
    </name>
    <data>
        <address>
            <street>Main</street>
            <city>Los Angeles</city>
        </address>
    </data>
</person>

Expected output

<person>
    <name_first>John</name_first>
    <name_last>Doe</name_last>
    <data_address_street>Main</data_address_street>
    <data_address_city>Los Angeles</data_address_city>
</person>

I have tried many things but the closer I've got is extracted from this answer.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    exclude-result-prefixes="xs" version="2.0">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/*/*">
        <xsl:for-each select="*">
            <xsl:element name="{concat(name(..),'_',name())}">
                <xsl:apply-templates select="node()"/>
            </xsl:element>
        </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

As @Michael Kay comments, one example does not constitute a specification. So I wanted to point out any comments, processing instructions, mixed content, and everything not in the example should be ignored.

onzinsky
  • 541
  • 1
  • 6
  • 21
  • One example does not constitute a specification: you say "any XML", but that means you need to specify what happens to attributes, comments, processing instructions, and mixed content. – Michael Kay Feb 17 '21 at 00:52
  • Thanks for your comment. You're right. I edited my question and tried to explain that any comments, processing instructions, and so on, should be ignored. The fact is I'm trying to keep the question short. Any other comments for bettering it will be welcomed. – onzinsky Feb 17 '21 at 09:30

1 Answers1

2

You can do it with string-join:

  <xsl:template match="/*">
      <xsl:copy>
          <xsl:apply-templates select="descendant::*[not(*)]"/>
      </xsl:copy>
  </xsl:template>
  
  <xsl:template match="*">
      <xsl:element name="{string-join(ancestor-or-self::*[position() ne last()]/name(), '_')}">
          <xsl:value-of select="."/>
      </xsl:element>
  </xsl:template>

With huge documents and XSLT 3 and streaming (e.g. Saxon EE) you can do

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">
    
    <xsl:mode streamable="yes"/>
    
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:template match="/*">
        <xsl:copy>
            <xsl:apply-templates select="descendant::text()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="text()">
        <xsl:element name="{string-join(ancestor::*[position() lt last()]/name(), '_')}">
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:template>
    
</xsl:stylesheet>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Lovely! Thank you so much!! – onzinsky Feb 16 '21 at 18:27
  • I cannot use xslt-3 at the moment. But I'm curious, do you mean your first option (using xslt-2) wouldn't work for "huge documents"? – onzinsky Feb 16 '21 at 23:01
  • 1
    XSLT 1 and 2 work on an in-memory input tree of the complete XML input document so for GB of XML you can run into memory problems. Streaming in XSLT 3 is a way to use a subset of XSLT without building the complete input tree and instead parse through and process in one (forwards only) go. – Martin Honnen Feb 16 '21 at 23:13
  • Thanks for your explanation and your patience. I'm not expecting particularly large documents, but I'll try to use xslt-3 anyways if possible. Also, I added a caveat to my question so I marked your answer as not accepted again. Maybe it should be a different question? – onzinsky Feb 17 '21 at 10:01
  • 1
    @onzinsky, that is never a good style to complicate requirements after an answer seemed to have solved the original problem, consider first to try to solve the more complicated case on your own, if that doesn't work ask it as a new question showing what you tried and how it failed. It seems that `` might suffice. – Martin Honnen Feb 17 '21 at 10:14
  • You're absolutely right. Sorry about that. I'll remove the edit and accept your answer again. – onzinsky Feb 17 '21 at 10:24