-1

I managed to write this regular expression for getting the inner html from a td tag,

<td[^>]*>(.*?)<\/td>

It is working fine. Except, neglecting the td tag in the matching. I just want to get the innerHTML, not the outerHTML. you can find a demo for my problem here.

Can anyone help me to get text in between the td tag?

P.S I am manipulating a string here not a html element.

Chand Ra
  • 69
  • 6

2 Answers2

1

Use DOM even for parsing HTML strings. HTML can be too tricky for a regex to stay effecient.

var s = 'this is a nice day<table><tr><td>aaaa <b>bold</b></td></tr><tr><td>bbbb</td></tr></table> here.';
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = s;
doc.appendChild( wrapper );
arr = [];
var n,walk=document.createTreeWalker(doc,NodeFilter.SHOW_ALL,null,false);
while(n=walk.nextNode())
{
      if (n.nodeName.toUpperCase() === "TD") {
         arr.push(n.innerHTML); 
      }
}
// See it works:
console.log(arr); // or...
for (var r = 0; r < arr.length; r++) {
 document.getElementById("r").innerHTML +=  arr[r] + "<br/>";
}
<div id="r"/>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You've actually already have the regex needed. It's just your confusing matches with captures. Your regex matches the outer HTML, but it captures the inner. Just do a match and get the first capture group. Check it out in this fiddle.

Here's the code

var s = '<table cellspacing="0px;" cellpadding="8px;"><tr><td align="right" style="padding-right:8px;line-height:18px;vertical-align:top;"><b>Import job summary</b></td><td align="left" style="max-width:300px;line-height:18px;vertical-align:top;"> 5 entries were imported successfully. 0 entries failed to import. </td></tr></table>',
    re = /<td[^>]*>(.*?)<\/td>/g,
    m = s.match(re),
    inner = ['No match'];

if (m.length>0) {
    // You have a capture
    inner = m;
}
document.write( 'Inner is:<br>' + inner.join('<br>') );

Regards

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • I am not against using regex to handle HTML in *some* cases, but this one is definitely not the one. First, `.*?` does not match newlines. Second, even if you use `[^]*?` backtracking buffer can simply be overrun with long HTML strings. Surely with our small examples it will work, but in real-life code, this might cause issues. – Wiktor Stribiżew Nov 06 '15 at 12:17
  • 1
    @stribizhev I agree, in *most* cases this is true. However, if you are certain the input isn't going to be to complex (e.g. you generate it yourself) **and** performance is an issue, regex might be a solution (imo ;). – SamWhan Nov 06 '15 at 12:34