1

I have an XSD and the requirement is to list the xpath of all the elements present in XSD into UI, so users can use it to perform some DOM related operations.

Can I programatically extract xpaths of all the elements from XSD?

ZygD
  • 22,092
  • 39
  • 79
  • 102
suraj bahl
  • 2,864
  • 6
  • 31
  • 42
  • The schema language is complex and allows for stuff like `maxOccurs="unbounded"` or for recursion meaning the number of elements an instance document can contain is not limited, how do you expect to be able to extract the XPath expressions of all elements? Also what is the path of an element, as there can be several ways to select a certain node. – Martin Honnen Jun 30 '15 at 13:52
  • It isn't entirely clear to me if this question is asking about the xpath of elements present in the XSD xml document itself, or rather, the list of all possible xpaths that might be found in an xml document described by the XSD. I will try to provide references to solutions I've been working on for both cases in an answer below. – bbarker Jul 10 '17 at 19:48

3 Answers3

2

It can be done, though you need to be aware that the set of all permitted paths is infinite (for example because of recursion or because of wildcards) so you will need a smart representation of this infinite set, or your code will need to give up and return something like "anything goes" if you find that the list can't be enumerated. The schema-aware Saxon product does something rather like this when checking a path expression such as .//para against the schema: if it knows the type of the context item, it can determine whether .//para is capable of selecting anything, and giving you a warning if not.

As the first step, you need to build the (relevant part of) the schema component model from the source schema documents. Don't try to do this yourself, it is far too much work. A number of products have an API that allows you to access the schema component model. Saxon allows you to generate the schema component model from source schema documents as an XML representation, using the -scmout flag on the Validate command line.

Once you have the schema component model, you can find the permitted children of an element by going to its complex type (if it's a simple type then the answer is trivial) and traversing the tree of particles recursively, looking only for the element particles and wildcard particles (you might decide that if there are wildcard particles, it's best to give up). You might want to consider not only the declared type of the element, but other types derived from that one by extension. You need to know the element declarations of the permitted children, not just the permitted child element names, because of course when it comes to finding the permitted grandchildren, you need to start from the element declaration, as there may be local declarations of elements with the same name.

And of course when you know the relation between element names and their permitted child elements, the set of paths is the transitive closure of this relation.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks for your answer - I've largely used it as the basis in my implementation described in my answer to this post, which only relies on scala.xml (and that used to be part of the standard library!). Still has some rough edges, but with use and interest, hopefully it can be made somewhat more generally useful. – bbarker Jul 10 '17 at 20:27
1

I've been working on a project that has methods for 1) extracting all xpaths of elements present in an xml document itself (e.g., the schema definition document), or 2) list all possible xpaths that might be found in an xml document described by the XSD.

If you are only interested in 1) the problem and my solution have been described and answered (albeit in Scala) at Scala: What is the easiest way to get all leaf nodes and their paths in an XML?

For 2), things are much more complicated, though in fact I used 1) as a starting point, and both 1) (XpathXmlEnumerator) and 2) (XpathXsdEnumerator) share a common interface (XpathEnumerator), for whatever that is worth. Although 2) is much longer, I supose at ~500 LOC it is still a rather lean implementation, all things considered (but could probably use more comments - please bug me to add them!). @michael-kay has done a great job of describing many of the difficulties and outlining a possible solution. Perhaps unfortunately I did not follow his advice for using software that understands a schema component model, but I did use scala.xml to try to simplify working with xml nodes in general. Still, I believe I overcame all the known difficulties of generating xpaths, since there is a high percentage of information/nodes in an XSD that is not necessary to be understood in order to generate XPaths in the documents being described by the XSD, so one can simply ignore such nodes.

The idea of filtering becomes important to avoid counting nodes that appear everywhere and you don't really care about in practice, and possibly also to avoid recursion. However, recursion should automatically be detected by the implementation in 2), further traversal of the given xpath avoided. For filters, the beginnings of using custom NodeFilters class is supported - see DdiCodebookSpec for example usage.

You can see some tests that run in the project in the same directory as ShipOrderXsdSpec, which contains some quickly running examples if you want to give it a try. Some of the other tests are not quickly running, and some have existing problems - this is "pre-alpha" software!

Though the solutions are in Scala, I'd be happy to create a Java wrapper (if needed - it may work directly) and even publish it to Maven if anyone actually wants that.

bbarker
  • 11,636
  • 9
  • 38
  • 62
0
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();

Then you can try to go through the list of nodes and get each node XPath

String getXPath(Node node)
{
    Node parent = node.getParent();
    if (parent == null) {
        return "/" + node.getTagName();
    }
    return getXPath(parent) + "/";
}
arseniyandru
  • 760
  • 1
  • 7
  • 16