XPath: Find HTML element by plain text

Question

Please note: A more refined version of this question, with an appropriate answer can be found here.

I would like to use the Selenium Python bindings to find elements with a given text on a web page. For example, suppose I have the following HTML:

<html>
    <head>...</head>
    <body>
        <someElement>This can be found</someElement>
        <someOtherElement>This can <em>not</em> be found</someOtherElement>
    </body>
</html>

I need to search by text and am able to find <someElement> using the following XPath:

//*[contains(text(), 'This can be found')]

I am looking for a similar XPath that lets me find <someOtherElement> using the plain text "This can not be found". The following does not work:

//*[contains(text(), 'This can not be found')]

I understand that this is because of the nested em element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?

score 19 · Accepted Answer · edited May 23 '17 at 12:18

19

You can use //*[contains(., 'This can not be found')].

The context node . will be converted to its string representation before comparison to 'This can not be found'.

Be careful though since you are using //*, so it will match ALL englobing elements that contain this string.

In your example case, it will match:

<someOtherElement>
and <body>
and <html>!

You could restrict this by targeting specific element tags or specific section in your document (a <table> or <div> with a known id or class)

Edit for the OP's question in comment on how to find the most nested elements matching the text condition:

The accepted answer here suggests //*[count(ancestor::*) = max(//*/count(ancestor::*))] to select the most nested element. I think it's only XPath 2.0.

When combined with your substring condition, I was able to test it here with this document

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <someOtherElement>This can <em>not</em> be found</someOtherElement>
</body>
</html>

and with this XPath 2.0 expression

//*[contains(., 'This can not be found')]
   [count(ancestor::*) = max(//*/count(./*[contains(., 'This can not be found')]/ancestor::*))]

And it matches the element containing "This can not be found most nested".

There probably is a more elegant way to do that.

edited May 23 '17 at 12:18

Community

1
1

answered Sep 06 '13 at 10:41

paul trmbrth

20,518
4
53
66

Thank you for your reply. Given your suggestion, is it possible to somehow select the most deeply-nested element (````) without further restricting on ```` ``
`` or the like?
– Michael Herrmann Sep 06 '13 at 10:46
@MichaelHerrmann, for the most-deeply nested element I'd have to search (on SO probably), but for the last one for example you could use `(//*[contains(., 'This can not be found')])[last()]` (note the brackets) – paul trmbrth Sep 06 '13 at 10:48
2

@MichaelHerrmann, according to [this SO answer](http://stackoverflow.com/questions/11135620/how-to-get-the-most-deeply-nested-element-nodes-using-xpath-implementation-wit) it's possible in XPath 2.0 with `//*[count(ancestor::*) = max(//*/count(ancestor::*))]` (you'd have to combine with the `contains()` test also of course) but I cannot test XPath 2.0 to confirm. – paul trmbrth Sep 06 '13 at 10:59
Thank you very much @pault. I accepted your answer because it answered my original question, as well as my follow-up question on the "most deeply-nested" element. However, by "most deeply-nested", I meant both ```` in your example. I tried ``//*[contains(., 'This can not be found')][not(contains(.//*, 'This can not be found'))]``, to no avail. Shall I open a new question, or edit this one? – Michael Herrmann Sep 09 '13 at 16:15
Ha, found it! ``//*[contains(., 'This can not be found')][not(.//*[contains(., 'This can not be found')])]`` :) Will update my question and your answer, if that's OK. – Michael Herrmann Sep 09 '13 at 16:22
1

@MichaelHerrmann, it would be better to open another question, and reply yourself to the new one if you have the answer. Then my edit to this question probably needs to be removed also. With the new question, people may come up with other answers – paul trmbrth Sep 09 '13 at 16:31
@pault. I edited my question to include your example and your answer to include the new answer. I hope that is OK also. (I had already edited both before reading your reply). Next time, I will open a new question. I did not want to "spam" by opening a new qn... – Michael Herrmann Sep 09 '13 at 16:34
@MichaelHerrmann, it's not really about spam. If your question is modified, with extra conditions, more acurate to depict your exact issues, it's better to open a new question. Otherwise the more general answer to the original question, that was accepted and voted for, becomes wrong or inacurate in the new question's context, and votes don't mean the same thing anymore. – paul trmbrth Sep 09 '13 at 16:43
@pault. I see. My edit to your answer is still pending. Do you think I should undo my edits and open a new qn? – Michael Herrmann Sep 09 '13 at 16:50
@pault. OK, will do that straight away. – Michael Herrmann Sep 09 '13 at 17:00
@pault. I created the new question here: http://stackoverflow.com/questions/18703467/xpath-find-html-element-by-plain-text/18703468. Will update this question now to at least point to the new one. – Michael Herrmann Sep 09 '13 at 17:14

XPath: Find HTML element by plain text

1 Answers1

Linked

XPath: Find HTML element by *plain* text

1 Answers1

Linked

XPath: Find HTML element by plain text