3

I have an XML document in the following format:

<Contents>
  <Content Name="ClientXML">
    <EntityData>
        <Data Name="EQ_EligibleForGuaranteedIssue">Yes</Data>
        <Data Name="ABRInd">NO</Data>
        <Data Name="AC_AgentNo">12345</Data>
        <Data Name="AC_AgentPersonallyMetWithApplicant">Has</Data>
        <Data Name="AC_City">Pomona</Data>
        <Data Name="AC_FirstName">Kimmy</Data>
        <Data Name="AC_FullName">Kimmy N Jackson</Data>
        <Data Name="AC_Initials">K J</Data>
        <Data Name="AC_LastAndSuf">Jackson</Data>
        ...
    </EntityData>
  </Content>
  <Content Name="UserXML">
    <EntityData>
        <Data Name="TransRefGUID">789-456-123456789-456</Data>
        ...
    </EntityData>
  </Content>
</Contents>

Other information:

  1. There can be several thousand 'Data' nodes under each 'EntityData' object
  2. The value of any 'Name' attribute is never duplicated.

I have to create an XSL transform and am using the xsl:value-of select="..." function. My question is, what XPath expression is going to execute the fastest? For example

<xsl:value-of select="\\Contents\Content[@Name="ClientXML"\EntityData\Data[@Name=".."]">

or simply

<xsl:value-of select="\\Data[@Name=".."]">

I don't have access to the end server which will eventually run this process, and locally the second option may appear to be a little faster.

Wondering if anyone has an opinion, and on a much larger scale if one may be faster.

Thanks!

I think I can code
  • 647
  • 1
  • 6
  • 18

2 Answers2

4

Using keys in XSLT will be far faster than an XPath expression, especially one with // which can be very slow to execute and should only be used when necessary.

<xsl:key match="Content" use="@Name" name="MyContentsLookup"/>
...
<xsl:value-of select="key('MyContentsLookup','ClientXML')"/>

An XSLT processor can implement internal search mechanisms to quickly look up a value in tens of thousands of entries, far faster than with XPath.

I've published an overview of XSLT keys here: http://www.CraneSoftwrights.com/resources/xslkeys/index.htm

G. Ken Holman
  • 4,333
  • 16
  • 14
  • This response is indeed fastest. However, if there is a situation where you are using the value-of select= xsl function, I was able to run a test: The full path: /Contents/Content[@Name='ClientXML']/EntityData/Data[@Name=$name] was much faster than the short path: //Data[@Name=$name] – I think I can code Oct 15 '13 at 16:46
  • Exactly. That was my point. `//` can be very slow. This is because you are asking the processor to look absolutely everywhere in your document for the item being addressed, and the processor doesn't know to stop when it finds exactly the one you are looking for. Whereas, when you spell out the complete XPath address, the processor isn't looking elsewhere unnecessarily. Thus, your longer expression executed faster. And, after all, we should be taking the extra time to make our code fast rather than using shortcuts unnecessasrily. – G. Ken Holman Oct 15 '13 at 21:18
0

When you say the contents of Name are never duplicated, is that true across the document as a whole, or only within each Content element? If it's true globally, then Ken's technique using keys is ideal. If it's only true locally, you might want to consider setting up a key that combines Content/@Name with EntityData/@Name.

The other thing to bear in mind is that performance depends on your processor. Implementors have a great deal of freedom to optimize the same expression in different ways. Even within the same product family, Saxon-EE will execute the expression //Data[@Name='abc'] very differently from the way Saxon-HE implements it (in effect, Saxon-EE creates keys automatically where needed, rather than requiring you to create them by hand). So you can't ask performance questions except in relation to a specific implementation.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164