For those that just want a function to use or a more XPath friendly solution
I have created a repository that extends the below code and should generate correct XPaths, but am leaving the below code as is since it is relatively simple and is a good place to start for understanding the code. The repo is on github.
Answer
Here's an implementation inspired by @Samar's answer that generates XPaths (so far without proper attribute notation), is tail-recursive, handles attributes, and doesn't use a mutable collection:
/**
* Helper function to add XPaths to a node sequence; assume a default of root nodes.
*/
def pathifyNodes(nodes: Seq[Node], parPath: String = "/"): Seq[(Node, String)] =
nodes.map{nn => (nn, parPath + nn.label + "/")}
@tailrec
final def uniqueXpaths(
nodes: Seq[(Node, String)], pathData: List[(String, String)] = Nil
): List[(String, String)] = nodes match {
case (node, currentPath) +: rest =>
val newElementData =
if(node.child.isEmpty) List((currentPath, node.text))
else Nil
val newAttributeData = node.attributes.asAttrMap.map{
case (key, value) => (currentPath + "@" + key, value)
}.toList
uniqueXpaths(
rest ++ node.child.flatMap(ns => pathifyNodes(ns, currentPath)),
newElementData ::: newAttributeData ::: pathData
)
case Seq() => pathData
}
Run like this:
val x = <div class="content"><a></a><p><q>hello</q></p><r><p>world</p></r><s></s></div>
val xpaOut = uniqueXpaths(pathifyNodes(x))
Suggestions welcome. I plan to fix attribute handling to generate proper XPaths that depend on attribute selection, and may also try to handle recursive XPaths in some sensible way, but I suspect this will greatly increase the code size so I wanted to go ahead and paste this for now.