Get text nodes with their parent elements sequentially (harder than it seems)

Question

Consider the following markup:

<div>Hello, my <span>name</span> is John</div>

I need to get the text nodes and the elements containing them, sequentially, as in:

1: "Hello, my ", <div> (HTMLElement)

2: "name", <span> (HTMLElement)

3: " is John", <div> (HTMLElement)

This needs to be done in a way that allows me to get CSS style of HTMLElements later.

What I already tried:

$(foo).find('*').contents().filter(function() {
    var $this = $(this);
    return this.nodeType === 3 && $.trim($this.text()).length > 0; 
});

This results in a non-sequential result set, as in:

1: "Hello, my "

2: " is John"

3: "name"

However, I can access their parent elements, so this does half of the job.

The main question, therefore, is: how do I get text nodes in the same sequence they are in the document?

do you need to know what their element is? would `innerText` not be sufficient — epoch, Oct 09 '12 at 05:54
Your problem is easily-enough answered with depth-first recursion. Basically, you set the current element, then you check its list of child nodes (`for loop`), in order. Notice I say nodes and not elements, checking each one by type. If it's a text-node, you've already got the parent (current element), and if it's a DOM element, then you pass it in as a parameter to the same function (recursion). The function continues depth-first (think about going inside the span, vs the rest outside) until a node has no children. I'd write it out, but I'm on a phone. — Norguard, Oct 09 '12 at 06:13

score 2 · Accepted Answer · answered Oct 09 '12 at 06:22

As described in Norguard’s comment, this can be solved by traversing the document tree:

function process(element) { 
  var children = element.childNodes; 
  for(var i = 0; i < children.length; i++) { 
    var child = children[i]; 
    if(child.nodeType === 3) { 
      if(child.data) { 
        processText(child.data, element); 
      } 
    } else { 
      process(child); 
    } 
  } 
}

You can then call this function, passing the desired element as parameter, provide that you have written your specific processing of the text nodes as a function processText(text, parent).

Note: I’m assuming childNodes contains the children in the order in which they appear in the document source. This sounds very natural, but I am unable to find a requirement on this in specifications, oddly enough – perhaps it is taken as self-evident.

The [childNodes](http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1451460987) object is a [NodeList](http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-536297177), which `provides the abstraction of an ordered collection of nodes`, where I think "order" can safely be assumed to mean "DOM order" (though perhaps the specification should state that explicitly somewhere, it uses the term quite often). — RobG, Oct 09 '12 at 06:27
I've apparently achieved the same effect using `TreeWalker` (I strongly believe it works almost the same way). — Stas Bichenko, Oct 09 '12 at 06:27

Get text nodes with their parent elements sequentially (harder than it seems)

1 Answers1