document.evaluate does not returns proper TextNodes XPath

Question

I am creating "Highlighter" for Android in WebView. I am getting XPath expression for the selected Range in HTML through a function as follows

/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5]

Now i am evaluating the above XPath expression through this function in javascript

var resNode = document.evaluate('/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5]',document,null,XPathResult.FIRST_ORDERED_NODE_TYPE ,null);
var startNode = resNode.singleNodeValue;

but I am getting the startNode 'null'.

But, here is the interesting point:

if I evaluate this '/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]' XPath expression using the same function, it gives the proper node i.e. a 'div'.

The difference between the two XPaths is the previous ones contains a textNode and later only div.

But the same thing is working fine on Desktop browsers.

Edited Sample HTML

<html>
<head>
<script></script>
</head>
<body>
<div id="mainpage" class="highlighter-context">
<div>       Some text here also....... </div>
<div>      Some text here also.........</div>
<div>
  <h1 class="heading"></h1>
  <div class="left_side">
    <ol></ol>
    <h1></h1>
    <div class="text_bio">
    In human beings, height, colour of eyes, complexion, chin, etc. are 
    some recognisable features. A feature that can be recognised is known as 
    character or trait. Human beings reproduce through sexual reproduction. In this                
    process, two individuals one male and another female are involved. Male produces   
    male gamete or sperm and female produces female gamete or ovum. These gametes fuse 
    to form zygote which develops into a new young one which resembles to their parent. 
     During the process of sexual reproduction 
    </div>
  </div>
  <div class="righ_side">
  Some text here also.........
  </div>
  <div class="clr">
         Some text here also.......
  </div>
</div>
</div>
</body>
</html>

getting XPath:

var selection = window.getSelection(); 
var range = selection.getRangeAt(0); 
var xpJson = '{startXPath :"'+makeXPath(range.startContainer)+      
             '",startOffset:"'+range.startOffset+
             '",endXPath:"'+makeXPath(range.endContainer)+ 
             '",endOffset:"'+range.endOffset+'"}';

function to make XPath:

function makeXPath(node, currentPath) {
          currentPath = currentPath || ''; 
          switch (node.nodeType) { 
          case 3:
          case 4:return makeXPath(node.parentNode, 'text()[' + (document.evaluate('preceding-sibling::text()', node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']');
          case 1:return makeXPath(node.parentNode, node.nodeName + '[' + (document.evaluate('preceding-sibling::' + node.nodeName, node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']' + (currentPath ? '/' + currentPath : ''));
          case 9:return '/' + currentPath;default:return '';
    }
}

I am not working with XML but with HTML in webview.

I tried using Rangy serialize and deserialize but the Rangy "Serialize" works properly but not the "deserialize".

Any ideas guys, whats going wrong?

UPDATE

Finally got the root cause of the problem (not solution yet :( )

`what exactly is happening in android webview. -->> Somehow, the android webview is changing the DOM structure of the loaded HTML page. Even though the DIV doesn't contains any TEXTNODES, while selecting the text from DIV, i am getting TEXTNODE for every single line in that DIV. for example, for the same HTML page in Desktop browser and for the same text selection, the XPath getting from webview is entirely different from that of given in Desktop Browser'

XPath from Desktop Browser:
startXPath /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1]
startOffset: 184 
endXPath: /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1]
endOffset: 342

Xpath from webview:
startXPath :/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[3]
startOffset:0 
endXPath:/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[4]
endOffset:151

Consider creating an [SSCCE](http://sscce.org/), there's a lot of unrelated code around. And also include some sample XML to work on. — Jens Erat, Jun 08 '13 at 14:02
Sorry for that, removed the commented code. I am using the code for HTML not XML. — Neernay, Jun 08 '13 at 14:16
Please add some XML input (or HTML, doesn't matter in the end); without any document to work on it's not possible to reproduce your problem. — Jens Erat, Jun 09 '13 at 00:08
WoW. This post was a life-saver for me. I struggled for almost two days trying to figure out why my XPath DOM Node calls were not working in my Flutter (webview) App ... until I stumbled onto this post. The WebView does indeed change the DOM Node structure - and the only way to really solve this problem is to debug your mobile app using the Browser DevTools. A good place to start looking for a solution is here -> `https://www.loginworks.com/blogs/inspect-webview-using-chrome-browser/` <- This really saved my ass - much appreciated @Neernay !!! — SilSur, Sep 28 '22 at 20:10

score 1 · Accepted Answer · answered Jun 09 '13 at 13:23

1

Well in your sample the path /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5] selects the fifth text child node of the div element

<div class="text_bio">
In human beings, height, colour of eyes, complexion, chin, etc. are 
some recognisable features. A feature that can be recognised is known as 
character or trait. Human beings reproduce through sexual reproduction. In this                
process, two individuals one male and another female are involved. Male produces   
male gamete or sperm and female produces female gamete or ovum. These gametes fuse 
to form zygote which develops into a new young one which resembles to their parent. 
 During the process of sexual reproduction 
</div>

That div has a single text child node so I don't see why text()[5] should select anything.

answered Jun 09 '13 at 13:23

Martin Honnen

160,499
6
90
110

You are absolutely right but then it raises another question regarding XPath: If the div doesn't contains too many child textnodes then how come the XPath returns child node as more than oNE during selecting text from that "div"... What do you say? – Neernay Jun 10 '13 at 05:44
It is rather difficult to read code snippets posted in comments. I don't see why your code would return `[5]` when doing `'text()[' + (document.evaluate('preceding-sibling::text()', node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']'` for the sample you posted, unless the contents of the `div` has been created with the DOM API and adjacent `text` nodes have been created. In that case you need to normalize the DOM before trying to create XPath expressions, see http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-normalize. – Martin Honnen Jun 10 '13 at 10:21
ok.. shifted the code to question. But its a static HTML page loaded in webview. No dynamic creation of any element. Will try your suggestion. – Neernay Jun 10 '13 at 10:51
@neernay, I am not familiar with Android details and webview and what kind of DOM implementation that uses. If the `normalize` suggestion does not help then maybe someone else with knowledge of that area can help further; I looked at it mainly based on XPath and Javascript and DOM and there with a single text node `text()[5]` is not going to select anything. – Martin Honnen Jun 10 '13 at 12:16
there is one interesting thing: When the same HTML is rendered on Desktop browser, the makeXPath() function returns the same XPath even if there are no textnodes in that div.. any guesses? – Neernay Jun 10 '13 at 14:56
created a jsFiddle. just select some text from the div, onmouse up it alerts out the start and end XPath. http://jsfiddle.net/neernay/aBh7w/ – Neernay Jun 11 '13 at 09:05
The example in the jsfiddle is different, there the `div class="text1"` has mixed contents i.e. text child nodes mixed with element child nodes (e.g. `character`), that way of course an XPath index greater than `1` is possible and correct. – Martin Honnen Jun 11 '13 at 09:50
Hey Martin. you nailed it! from your previous comment got what exactly is happening in android webview. Somehow, the android webview is changing the DOM structure of the loaded HTML page. Even though the DIV doesn't contains any TEXTNODES, while selecting the text from DIV, i am getting TEXTNODE for every single line in that DIV. for example, for the same HTML page in Desktop browser and for the same text selection, i am getting different XPath from webview... – Neernay Jun 11 '13 at 12:24
... XPath from Desktop Browser: `startXPath : /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1] startOffset: 184 endXPath: /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1] endOffset: 342` --------------------------------------------------------------------- Xpath from webview: `startXPath : /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[3] startOffset:0 endXPath: /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[4] endOffset:151` Don't know why this is happening, but i am glad at least i got the cause for the problem, now i can workaround the problem. Thanks Buddy! – Neernay Jun 11 '13 at 12:31
@neernay Did you have solved this problem? I also met this issue when I get xpath of my selection's startNode/endNode. Can you give me some help ? – Xianfeng.Cai Nov 04 '13 at 05:16

document.evaluate does not returns proper TextNodes XPath

1 Answers1