6

Consider code like the following:

<p>&nbsp;</p><!-- comment -->
<span></span><br />
<div><span class="foo"></span></div>

which on a browser would effectively render as a stretch of whitespace.

I'm wondering if, given that or similar markup, there is a straightforward, programmatic way to detect that the end result of this code with whitespace stripped is an empty string.

The implementation here is JavaScript but I'm also interested in a more general (language agnostic) solution as well if one exists.

Note that just stripping out the tags and seeing if any text remains is not a real fix as there are plenty of tags which do end up rendering visible content (e.g. img, hr, etc).

Jordan Reiter
  • 20,467
  • 11
  • 95
  • 161
  • You can use CSS to render content to a page, just looking at the mark up may not be enough – Musa Jun 21 '17 at 12:26
  • Use the DOM API, have a list of characters you consider whitespace, recursively confirm whether the only content of any given node is whitespace text (or the node is a comment etc.) and remove that node if so; if you're left with no nodes it was all whitespace. – Note that this won't catch white text on white background for example… – deceze Jun 21 '17 at 12:27

1 Answers1

0

This is the answer I came up with. It uses a whitelist of tags that are assumed to render on the page whether they have content or not — all other tags are assumed to only render if they have actual text content. Once you have that in place, actually the solution is fairly easy — it relies on the fact that the innerText attribute strips out all tags automatically.

This solution also ignores elements which render based on CSS (e.g. blocks with a background color or where content is set for the :after or :before pseudo-elements) but fortunately this isn't relevant for my use case.

function htmlIsWhitespace(input) {
 var visible = [
   'img','iframe','object','hr', 
   'audio', 'video', 
   'form', 'button', 'input', 'select', 'textarea'
  ],
  container = document.createElement('div');
 container.innerHTML = input;
 return !(container.innerText.trim().length > 0 || container.querySelector(visible.join(',')));
}

// And the tests (I believe these are comprehensive):

var testStringsYes = [
  "",
  "<a href='#'></a>",
  "<a href='#'><span></span></a>",
  "<a href='#'><span> <!-- comment --></span></a>",
  "<a href='#'><span> &nbsp;</span></a>",
  "<a href='#'><span> &nbsp; </span></a>",
  "<a href='#'><span> &nbsp;</span></a> &nbsp;",
  "<p><a href='#'><span> &nbsp;</span></a> &nbsp;</p>",
  " <p><a href='#'><span> &nbsp;</span></a> &nbsp;</p> &nbsp; <p></p>",
  "<p>\n&nbsp;\n</p><ul><li></li></ul>"
 ],
 testStringsNo = [
  "<a href='#'><span> &nbsp;hi</span></a>",
  "<img src='#foo'>",
  "<hr />",
  "<div><object /></div>",
  "<div><iframe /></div>",
  "<div><object /></div>",
  "<div><!-- hi -->bye</div>",
  "<div><!-- what --><audio></audio></div>",
  "<div><!-- what --><video></video></div>",
  '<form><!-- empty --></form>',
  '<input type="text">',
  '<select name="foo"><option>1</option></select>',
  '<textarea>',
  '<input type="text">',
  '<form><input type="button"></form>',
  '<button />',
  '<button>Push</button>',
  "yo"
 ];

for(var yy=0, yl=testStringsYes.length; yy < yl; yy += 1) {
 console.debug("Testing", testStringsYes[yy]);
 console.assert(htmlIsWhitespace(testStringsYes[yy]));
}

for(var nn=0, nl=testStringsNo.length; nn < nl; nn += 1) {
 console.debug("Testing", testStringsNo[nn]);
 console.assert(!htmlIsWhitespace(testStringsNo[nn]));
}
Jordan Reiter
  • 20,467
  • 11
  • 95
  • 161