I have an XSD and the requirement is to list the xpath of all the elements present in XSD into UI, so users can use it to perform some DOM related operations.
Can I programatically extract xpaths of all the elements from XSD?
I have an XSD and the requirement is to list the xpath of all the elements present in XSD into UI, so users can use it to perform some DOM related operations.
Can I programatically extract xpaths of all the elements from XSD?
It can be done, though you need to be aware that the set of all permitted paths is infinite (for example because of recursion or because of wildcards) so you will need a smart representation of this infinite set, or your code will need to give up and return something like "anything goes" if you find that the list can't be enumerated. The schema-aware Saxon product does something rather like this when checking a path expression such as .//para against the schema: if it knows the type of the context item, it can determine whether .//para is capable of selecting anything, and giving you a warning if not.
As the first step, you need to build the (relevant part of) the schema component model from the source schema documents. Don't try to do this yourself, it is far too much work. A number of products have an API that allows you to access the schema component model. Saxon allows you to generate the schema component model from source schema documents as an XML representation, using the -scmout flag on the Validate command line.
Once you have the schema component model, you can find the permitted children of an element by going to its complex type (if it's a simple type then the answer is trivial) and traversing the tree of particles recursively, looking only for the element particles and wildcard particles (you might decide that if there are wildcard particles, it's best to give up). You might want to consider not only the declared type of the element, but other types derived from that one by extension. You need to know the element declarations of the permitted children, not just the permitted child element names, because of course when it comes to finding the permitted grandchildren, you need to start from the element declaration, as there may be local declarations of elements with the same name.
And of course when you know the relation between element names and their permitted child elements, the set of paths is the transitive closure of this relation.
I've been working on a project that has methods for 1) extracting all xpaths of elements present in an xml document itself (e.g., the schema definition document), or 2) list all possible xpaths that might be found in an xml document described by the XSD.
If you are only interested in 1) the problem and my solution have been described and answered (albeit in Scala) at Scala: What is the easiest way to get all leaf nodes and their paths in an XML?
For 2), things are much more complicated, though in fact I used 1) as a starting point, and both 1) (XpathXmlEnumerator
) and 2) (XpathXsdEnumerator
) share a common interface (XpathEnumerator
), for whatever that is worth. Although 2) is much longer, I supose at ~500 LOC it is still a rather lean implementation, all things considered (but could probably use more comments - please bug me to add them!). @michael-kay has done a great job of describing many of the difficulties and outlining a possible solution. Perhaps unfortunately I did not follow his advice for using software that understands a schema component model, but I did use scala.xml
to try to simplify working with xml nodes in general. Still, I believe I overcame all the known difficulties of generating xpaths, since there is a high percentage of information/nodes in an XSD that is not necessary to be understood in order to generate XPaths in the documents being described by the XSD, so one can simply ignore such nodes.
The idea of filtering becomes important to avoid counting nodes that appear everywhere and you don't really care about in practice, and possibly also to avoid recursion. However, recursion should automatically be detected by the implementation in 2), further traversal of the given xpath avoided. For filters, the beginnings of using custom NodeFilters
class is supported - see DdiCodebookSpec
for example usage.
You can see some tests that run in the project in the same directory as ShipOrderXsdSpec
, which contains some quickly running examples if you want to give it a try. Some of the other tests are not quickly running, and some have existing problems - this is "pre-alpha" software!
Though the solutions are in Scala, I'd be happy to create a Java wrapper (if needed - it may work directly) and even publish it to Maven if anyone actually wants that.
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Then you can try to go through the list of nodes and get each node XPath
String getXPath(Node node)
{
Node parent = node.getParent();
if (parent == null) {
return "/" + node.getTagName();
}
return getXPath(parent) + "/";
}