2

I am trying to generate a XML from another XML based on some defined XPATH.

XPATH:

country/name,
country/org_id,
country/lang,
country/currency,
generate_date,
schedule/category/id,
schedule/category/name,
schedule/category/classes/class/id,
schedule/category/classes/class/duration,
schedule/category/classes/class/price,
schedule/category/classes/class/instruction_language

Xpath is excluding the name of root node and it is a list.

XML:

<?xml version="1.0" encoding="utf-8" ?>
<ou_schedule>
  <country>
    <name>Country Name</name>
    <org_id>Org ID</org_id>
    <lang>language</lang>
    <currency>Currency</currency>
  </country>
  <generate_date>Date</generate_date>
  <schedule>
    <category>
      <id>cat id</id>
      <name>Cat name</name>
      <classes>
        <class>
          <id>class id</id>
          <duration>class duration</duration>
          <price>price</price>
          <instruction_language>Test Data</instruction_language>
        </class>
        <class>
          <id>class id</id>
          <duration>class duration</duration>
          <price>price</price>
          <instruction_language>Test Data</instruction_language>
        </class>
      </classes>
    </category>
  </schedule>
</ou_schedule>

Output:

<?xml version="1.0" encoding="utf-8"?>
<ou_schedule>
  <country.name>country name</country.name>
  <country.org_id>org id</country.org_id>
  <country.lang>language</country.lang>
  <country.currency>currency</country.currency>
  <generate_date>date</generate_date>
  <schedule.category.name>Cat Name</schedule.category.name>
  <schedule.category.id>Cat ID</schedule.category.id>
  <schedule.category.classes.class.id>class id</schedule.category.classes.class.id>
  <schedule.category.classes.class.duration>class duration</schedule.category.classes.class.duration>
  <schedule.category.classes.class.price>price</schedule.category.classes.class.price>
  <schedule.category.classes.class.instruction_language>Test Data</schedule.category.classes.class.instruction_language>

  <country.name>country name</country.name>
  <country.org_id>org id</country.org_id>
  <country.lang>language</country.lang>
  <country.currency>currency</country.currency>
  <generate_date>date</generate_date>
  <schedule.category.name>Cat Name</schedule.category.name>
  <schedule.category.id>Cat ID</schedule.category.id>
  <schedule.category.classes.class.id>class id</schedule.category.classes.class.id>
  <schedule.category.classes.class.duration>class duration</schedule.category.classes.class.duration>
  <schedule.category.classes.class.price>price</schedule.category.classes.class.price>
  <schedule.category.classes.class.instruction_language>Test Data</schedule.category.classes.class.instruction_language>
</ou_schedule>

Here, to remove ambiguity I am naming the nodes names with their ancestors except root node i.e., same as XPATH but replacing / with ..

Is it possible to achieve this using some generic XSLT?

Abhishekh Gupta
  • 6,206
  • 4
  • 18
  • 46
  • What other format for example? text? JSON? – Alexander Apr 24 '15 at 14:07
  • try XSLT to transform the XML into another format. – Alexander Apr 24 '15 at 14:10
  • Is this list of XPaths static, or will it ever change? -- Note that the output you show us is not well-formed XML (has no root element). – michael.hor257k Apr 24 '15 at 15:22
  • Your question is not well-defined: the input does not determine the output. Why are your XPaths partial (not starting from the root)? And how did `schedule/category/name` turn into `category_name` under `class`? Without some more input, this task is not possible. – michael.hor257k Apr 25 '15 at 09:11
  • @Beginner That's a VERY different question from the one you asked before. Could you also explain how do you plan to update the XPaths? Will you be passing them as a parameter to the stylesheet, or what? And which XSLT processor will you be using? – michael.hor257k Apr 26 '15 at 18:18
  • Yes, I am passing them as parameter. I will be helpful if the solution will be generic. As I am using two different processors `SAXON 9.3.0.5` and `Apache Xalan`. – Abhishekh Gupta Apr 26 '15 at 18:24
  • @Beginner Your new update makes your question ill-defined again. All the XPaths given in your example have the same hierarchy. There is no way to tell the stylesheet to create a group for each occurrence of a specific XPath - unless you somehow designate that XPath as special. – michael.hor257k Apr 27 '15 at 09:50
  • @michael.hor257k Basically I am getting an XML and I want to convert it into a normalized XML. And then perform some operations. Can a nested XML be converted into csv? – Abhishekh Gupta Apr 27 '15 at 10:12
  • @Beginner I don't know what "normalized XML" means. The common scenario with XML data-processing is that you get an XML that **conforms to a known schema**, and you output another document (XML, HTML, text or PDF) that conforms to another schema - also known in advance. With the two known schemas in front of you, you can compose a custom XSLT stylesheet that will function for any given conforming input. Attempts to make the XSLT generic can be successful to some extent, but only if some stated constrains are observed. In its current form, your problem can be solved only through clairvoyance. – michael.hor257k Apr 27 '15 at 10:13
  • "*Can a nested XML be converted into csv"* Yes, if the hierarchy of nesting is known in advance. Otherwise, no. – michael.hor257k Apr 27 '15 at 10:14

2 Answers2

2

Is it possible to achieve this using some generic XSLT?

In case there are two solutions: one for XSLT 1.0 and one for XSLT 2.0, it may be possible to (rather artificially) combine them into one, using techniques as the XSLT 2.0 conditional compilation, that will exclude at "pre-compile time" the templates and declarations of the XSLT 1.0 solution. The XSLT 1.0 solution, on the other hand, will operate in forwards-compatibility mode and will also have higher priority specified for its templates (higher than the priority of the XSLT 2.0 solution's template), thus no XSLT 2.0 solution's template will be selected for execution, when the transformation is run with an XSLT 1.0 processor.

One can regard this as an interesting exercise, and follow the example in the book of Michael Kay "XSLT 2.0 and XPath 2.0", Chapter 3: "Stylesheet Structure", Section "Writing Portable stylesheets", Subsection: "Conditional Compilation". The example (in the edition I have) is on page 128.


Here is a short XSLT 2.0 solution (18 lines if the parameters values are omitted), pure (no extension functions) , that doesn't use explicit XSLT conditional instructions or any xsl:variable. Even the tokenize() function isn't used:

<xsl:stylesheet version="2.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:param name="pPaths" as="xs:string+" select=
  "'country/name',
   'country/org_id',
   'country/lang',
   'country/currency',
   'generate_date',
   'schedule/category/id',
   'schedule/category/name',
   'schedule/category/classes/class/id',
   'schedule/category/classes/class/duration',
   'schedule/category/classes/class/price',
   'schedule/category/classes/class/instruction_language'"/>

  <xsl:template match="/*">
    <xsl:copy><xsl:apply-templates/></xsl:copy>
  </xsl:template>

  <xsl:template match=
   "*/*[string-join((ancestor::*[position() ne last()]| .)/name(), '/') = $pPaths]">
    <xsl:element 
       name="{string-join((ancestor::*[position() ne last()]|.)/name(), '.')}">
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>

Solution 2:

Here the URI (filepath) of a resource (file) is passed as a parameter. This file contains all wanted XPath expressions -- each one on a separate line.

<xsl:stylesheet version="2.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:param name="pFilePath" select="'file:///C:/temp/expressions.txt'"/>

 <xsl:variable name="vExprs" select="tokenize(unparsed-text($pFilePath), '\r?\n')"/>

  <xsl:template match="/*">
    <xsl:copy><xsl:apply-templates/></xsl:copy>
  </xsl:template>

  <xsl:template match=
   "*/*[string-join((ancestor::*[position() ne last()]| .)/name(), '/') = $vExprs]">
    <xsl:element name=
       "{string-join((ancestor::*[position() ne last()]|.)/name(), '.')}">
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>

Solution 3:

Both previous solutions can be further optimized and simplified, if for the input XPath expressions it is known that they select elements that have a single text-node child (and this is the case with the originally-provided input XPath expressions and provided source XML document):

<xsl:stylesheet version="2.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:param name="pFilePath" select="'file:///C:/temp/expressions.txt'"/>

 <xsl:variable name="vExprs" select="tokenize(unparsed-text($pFilePath), '\r?\n')"/>

  <xsl:template match="/*">
    <xsl:copy><xsl:apply-templates/></xsl:copy>
  </xsl:template>

  <xsl:template match=
   "text()[string-join(ancestor::*[position() ne last()]/name(), '/') = $vExprs]">
    <xsl:element 
      name="{string-join(ancestor::*[position() ne last()]/name(), '.')}">
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • @michael.hor257k, Dimitre Novatchev: In my output I am trying to select the deepest element node and then return all its child nodes and its ancestor nodes. As the XML format is not fixed and I only have these XPATH. So I am trying to convert it in a linear format. Is it possible? – Abhishekh Gupta Apr 27 '15 at 07:06
  • @Beginner, It is possible to find all deepest elements (there may be more than one with a maximum depth. Such element can have only a text-node child. Do you mean the deepest from the elements selected by the XPath expression? Also what is "it" in the statement " I am trying to convert it in a linear format" ? – Dimitre Novatchev Apr 27 '15 at 13:36
  • Yes the deepest element node in xpath list. It means xml. I only have the xpath list and I want the XML structure to be same as a CSV structure . – Abhishekh Gupta Apr 27 '15 at 15:19
  • @Beginner, This is still quite unclear. I recommend that you ask a new question, provide the source XML document, point out which nodes you want to have the XPath expressions for, and what you want the transformation to do (rules/constraints) and what the result should be, and what is the correct result given the exact source XML document you have provided. – Dimitre Novatchev Apr 27 '15 at 15:24
  • Can the same be achieved using `XSLT 1.0` as my processor is `XALAN`? – Abhishekh Gupta May 14 '15 at 10:25
  • @Beginner, Please, do your homework and learn. It isn't realistic to expect that anyone would explain to you very basic facts about XSLT/XPath, that other people learn by themselves without problems. Experiment with the code if it doesn't produce *exactly* what you want. – Dimitre Novatchev May 14 '15 at 15:09
  • I completed it. Thanks – Abhishekh Gupta May 14 '15 at 15:17
  • @Beginner, Sorry if my last comment was too direct. I can recommend you (shamelessly) this Pluralsight training course on XSLT (2.0 and 1.0): http://www.pluralsight.com/courses/xslt-foundations-part1 – Dimitre Novatchev May 14 '15 at 15:25
0

My first thought was: Interesting, here we'll get a dynamically built XSL transformation. But this does not seem achievable as dynamic xpath in xslt explains.

So, a second idea is needed: You can think an XSL transformation as a list of XPATH expressions. In this sense you just need an XSLT file like the following

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" indent="yes"/>

    <!-- the following select-attributes are the set of XPATH expressions 
         (relative to /ou_schedule/schedule/category/classes/class) -->
    <xsl:template name="XPathList">
        <category_name>
            <xsl:apply-templates select="ancestor::category/name"/>
        </category_name>

        <category_id>
            <xsl:apply-templates select="ancestor::category/id"/>
        </category_id>

        <id>
            <xsl:apply-templates select="id"/>
        </id>

        <duration>
            <xsl:apply-templates select="duration"/>
        </duration>

        <price>
            <xsl:apply-templates select="price"/>
        </price>

        <instruction_language>
            <xsl:apply-templates select="instruction_language"/>
        </instruction_language>
    </xsl:template>

    <!-- Basis -->
    <xsl:template match="/">
        <ou_schedule>
            <xsl:apply-templates select="//class"/>
        </ou_schedule>
    </xsl:template>

    <xsl:template match="class">
        <xsl:copy>
            <xsl:call-template name="XPathList"/>
        </xsl:copy>    
    </xsl:template>
</xsl:stylesheet>

Well, one could have written this transformation in a more compact way. But the aim was to translate the idea of "having a list of XPATHs to transform an XML" into code.

Community
  • 1
  • 1
leu
  • 2,051
  • 2
  • 12
  • 25