1

Having problem with a textNode that I can't convert to a string. I'm trying to scrape a site and get certain information out from it, and when I use an XPath to find this text I'm after I get an textNode back. When I look in google development tool in chrome, I can se that the textNode itself contain the text I'm after, but how do I convert the textNode to plain text?

here is the line of code I use:

abstracts = ZU.xpath(doc, '//*[@id="abstract"]/div/div/par/text()');

I have tried to use stuff like .innerHTML, toString, textContent but nothing have worked so far.

anderssinho
  • 298
  • 2
  • 7
  • 21
  • 1
    Did you try console.log(abstracts) to see what it contains? I don't know what ZU is, but most xpath query methods return a node list, which you would have to get your data out of. – James Oct 30 '15 at 06:37
  • For some reason I can't use that in my extension I'm building. Bur if I use Firebug or google dev tool and use the xpath, I can se what I should get back in abstarcts, and that's a textNode. Zu is a just a declared already in the program I'm currently trying to extend – anderssinho Oct 30 '15 at 06:39
  • 1
    You will need to read the documentation of that XPath library that provides you the `ZU.xpath` method to see what kind of result it returns. If it is a W3C DOM node then you can read out its `nodeValue` property, but as in general a path selecting nodes can return various nodes a method could as well return a collection or an iterator. Other APIs for simplicity might return a string value. So we can't really tell what `ZU.xpath` returns, we would need to see the API documentation. – Martin Honnen Oct 30 '15 at 10:15

1 Answers1

3

I usually use Text.wholeText if I want to see the content string of a textNode, because textNode is an object so using toString or innerHTML will not work because it is an object not as the string itself...

Example: from https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

The Text.wholeText read-only property returns the full text of all Text nodes logically adjacent to the node. The text is concatenated in document order. This allows to specify any text node and obtain all adjacent text as a single string.

Syntax

str = textnode.wholeText;

Notes and example: Suppose you have the following simple paragraph within your webpage (with some whitespace added to aid formatting throughout the code samples here), whose DOM node is stored in the variable para:

<p>Thru-hiking is great!  <strong>No insipid election coverage!</strong>
However, <a href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>

You decide you don’t like the middle sentence, so you remove it:

para.removeChild(para.childNodes[1]);

Later, you decide to rephrase things to, “Thru-hiking is great, but casting a ballot is tricky.” while preserving the hyperlink. So you try this:

para.firstChild.data = "Thru-hiking is great, but ";

All set, right? Wrong! What happened was you removed the strong element, but the removed sentence’s element separated two text nodes. One for the first sentence, and one for the first word of the last. Instead, you now effectively have this:

<p>Thru-hiking is great, but However, <a
href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>

You’d really prefer to treat all those adjacent text nodes as a single one. That’s where wholeText comes in: if you have multiple adjacent text nodes, you can access the contents of all of them using wholeText. Let’s pretend you never made that last mistake. In that case, we have:

assert(para.firstChild.wholeText == "Thru-hiking is great!    However, ");

wholeText is just a property of text nodes that returns the string of data making up all the adjacent (i.e. not separated by an element boundary) text nodes combined.

Now let’s return to our original problem. What we want is to be able to replace the whole text with new text. That’s where replaceWholeText() comes in:

para.firstChild.replaceWholeText("Thru-hiking is great, but ");

We’re removing every adjacent text node (all the ones that constituted the whole text) but the one on which replaceWholeText() is called, and we’re changing the remaining one to the new text. What we have now is this:

<p>Thru-hiking is great, but <a
href="http://en.wikipedia.org/wiki/Absentee_ballot">casting a
ballot</a> is tricky.</p>

Some uses of the whole-text functionality may be better served by using Node.textContent, or the longstanding Element.innerHTML; that’s fine and probably clearer in most circumstances. If you have to work with mixed content within an element, as seen here, wholeText and replaceWholeText() may be useful.

More info: https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText

Richard Ramos
  • 61
  • 1
  • 7
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/20021207) – Al Foиce ѫ Jun 14 '18 at 09:41
  • 1
    Added an example from the https://developer.mozilla.org/en-US/docs/Web/API/Text/wholeText – Richard Ramos Jun 15 '18 at 07:34