XML Searching - which is fast, text within nodes or text as attribute value

Question

Don't know is it a right question or not, but of curiosity, I want to know that which will be searched fast. For Ex-

<A>
  <Name>John</Name>
</A>

or

<A>
  <Name n="John"/>
</A>

I have stored millions of text as attribute value, though not enough large in character size. Above is just an example for better understanding the question.

Now if using XML Databases, like BaseX, eXists, etc etc , I try to search or create and index of all names then which will be faster?

The difference between these two variants will be very small compared to those caused by other design decisions you may have to make during the development of your XSLT (e.g. usage of indexes, template match patterns, choice of binary tool for XSLT, number of calls of the tool). In the concrete case my gut feeling would be that the attribute based variant may be slightly faster due to slightly simpler parsing requirements since the contents of the attribute are relatively restricted compared to the general case of a sub tree between the opening and closing tag. — Marcus Rickert, Jul 25 '14 at 13:30

score 2 · Accepted Answer · answered Jul 27 '14 at 12:12

This is implementation-depended, so one can not generalize this for all XML databases. Although in this simple case, I guess it is the same for all databases: It does not matter.

I am going to explain for BaseX what will be happening here. Lets say you use the first structure and you want to get the <A/> element. So you use an XPath like

//A[Name = "John"]

This will be optimized to the following query:

db:text("your-database", "John")/parent::*:Name/parent::*:A

Whereas an XPath for your second data structure would probably look something like this:

//A[Name/@n = "John"]

which will be optimized to be

db:attribute("your-database", "John")/self::*:n/parent::*:Name/parent::*:A

As you can see, apart from the one path step more (because you have to access the attribute), which is very cheap, the major difference is using db:text() vs. db:attribute(). But as documented, both of this functions will use the value index if present (which it is by default), and will be quite fast thanks to the index lookup.

In reality, if you are designing an XML-based application and want to later retrieve information using XQuery, you will most certainly have other bottlenecks, e.g. non-index using queries or nested for loops.

XML Searching - which is fast, text within nodes or text as attribute value

1 Answers1