17

I need to determine the length of string which may contain html-entities.

For example "&darr ;" (↓) would return length 6, which is correct, but I want these entities to be counted as only 1 character.

Hedge
  • 16,142
  • 42
  • 141
  • 246

6 Answers6

11
<div id="foo">&darr;</div>

alert(document.getElementById("foo").innerHTML.length); // alerts 1

So based on that rationale, create a div, append your mixed up entity ridden string to it, extract the HTML and check the length.

var div = document.createElement("div");
div.innerHTML = "&darr;&darr;&darr;&darr;";
alert(div.innerHTML.length); // alerts 4

Try it here.

You might want to put that in a function for convenience, e.g.:

function realLength(str) { // maybe there's a better name?
    var el = document.createElement("div");
    el.innerHTML = str;
    return el.innerHTML.length;   
}
karim79
  • 339,989
  • 67
  • 413
  • 406
  • I [modified the fiddle](http://jsfiddle.net/e89hJ/1/) and tested this solution against the inputs and outputs on the library-based solution I posted and there was some inconsistency. Namely, I entered an encoded input and got 110 chars, then entered a decoded input of the same data and got 96 chars. Not sure why exactly, but it might be worth checking on. – Nathan Taylor Jan 24 '11 at 23:21
  • @Nathan Taylor - The inconsistency could feasibly be caused by jQuery's `.val`, or line breaks. – karim79 Jan 24 '11 at 23:22
  • @Nathan - Second update using the DOM .value property, seems to work just fine, provided you don't hit enter anywhere in the textarea (each \n == 1 char): http://jsfiddle.net/e89hJ/2/ – karim79 Jan 24 '11 at 23:26
3

Since there's no solution using jQuery yet:

var str = 'lol&amp;';
alert($('<span />').html(str).text().length); // alerts 4

Uses the same approach like karim79, but it never adds the created element to the document.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
3

You could for most purposes assume that an ampersand followed by letters, or a possible '#' and numbers, followed by a semicolon, is one character.

var strlen=string.replace(/&#?[a-zA-Z0-9]+;/g,' ').length;
kennebec
  • 102,654
  • 32
  • 106
  • 127
1

If you are running the javascript in a browser I would suggest using it to help you. You can create an element and set its innerHTML to be your string containing HTML-entities. Then extract the contents of that element you just created as text.

Here is an example (uses Mootools): http://jsfiddle.net/mqchen/H73EV/

mqchen
  • 4,195
  • 1
  • 22
  • 21
0

Unfortunately, JavaScript does not natively support encoding or decoding of HTML entities, which is what you will need to do to get the 'real' string length. I was able to find this third-party library which is able to decode and encode HTML entities and it appears to work well enough, but there's no guaranteeing how complete it will be.

http://www.strictly-software.com/htmlencode

Nathan Taylor
  • 24,423
  • 19
  • 99
  • 156
0

Using ES6 (introduces codePointAt():

function strlen (str) {
    let sl = str.length
    let chars = sl
    for (i = 0; i < sl; i++) if (str.codePointAt(i) > 65535) {
       chars--;
       i++;
    }
    return chars
}

Beware charCodeAt() does not work the same way.

CodeClown42
  • 11,194
  • 1
  • 32
  • 67