1

I want to write a function that simply validates the syntax of an XPath expression independent of any XML/HTML (so it would catch something like //p[text(}="Hello"] as having a typo), but I can't seem to find a full fledged XPath specification.

Follow Up: It seems I should follow up so people don't think I'm crazy. I'm trying to write a Rust procedural macro for catching malformed xpaths at compile time, as sort of a convenience feature to build into Rust Selenium bindings (see thirtyfour or fantoccini on crates.io) so that one doesn't have to get 20 minutes into their test suite before Selenium crashes to tell them they have a typo like with the other language bindings. I'll probably end up doing FFI out to a well-maintained library in another language (actually https://crates.io/crates/libxml might do the trick), but I thought I'd at least see what the workload would be to write this myself.

TylerH
  • 20,799
  • 66
  • 75
  • 101

2 Answers2

2

The W3C XPath Recommendations are the official specifications for XPath:

If you must write an XPath parser from scratch, at least one of those will be necessary reading. If you do not really need to write an XPath parser from scratch, you might simply use an existing library and examine the return value or catch any parsing exceptions to determine the well-formedness of the passed XPath expression.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • 1
    Many XPath implementations are single-shot compile-and-go, so it may be difficult to distinguish syntax errors from other static errors and dynamic errors. But it depends what you're trying to achieve. – Michael Kay Jan 02 '22 at 10:41
2

In addition to the official specifications, if your aim is to write a syntax checker, the are tools to assist you in that, like grammars and parser generators for those grammars, such as https://bottlecaps.de/rex/ so using the grammars there, like the one for XPath 3.1, you can generate code to check the syntax in a lot of target languages, like for instance JavaScript (https://martin-honnen.github.io/xpath31fiddle/js/RExXPath31Fast.js) and then have your code use that parser with e.g. to check for syntax errors

function parseExample(xpath31Expression) {
    try {
      xpath31Parser = new RExXPath31Fast(xpath31Expression);
      xpath31Parser.parse_XPath();
      console.log(`${xpath31Expression} is fine.`);
    }
    catch (pe) {
      if (pe instanceof xpath31Parser.ParseException) {
        console.log(`Error in ${xpath31Expression}:`);
        console.log(xpath31Parser.getErrorMessage(pe));
      }
    }
}
parseExample(`//p[.="Hello"]`);
parseExample(`//p[text(}="Hello"]`);
<script src="https://martin-honnen.github.io/xpath31fiddle/js/RExXPath31Fast.js"></script>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • The **REx Parser Generator** is a great suggestion. Related: [How to consume W3C EBNF-Notation and produce a parser generator?](https://stackoverflow.com/q/56047642/290085) answer by its author. – kjhughes Jan 02 '22 at 14:23