I need to determine the length of string which may contain html-entities.
For example "&darr ;" (↓) would return length 6, which is correct, but I want these entities to be counted as only 1 character.
I need to determine the length of string which may contain html-entities.
For example "&darr ;" (↓) would return length 6, which is correct, but I want these entities to be counted as only 1 character.
<div id="foo">↓</div>
alert(document.getElementById("foo").innerHTML.length); // alerts 1
So based on that rationale, create a div, append your mixed up entity ridden string to it, extract the HTML and check the length.
var div = document.createElement("div");
div.innerHTML = "↓↓↓↓";
alert(div.innerHTML.length); // alerts 4
You might want to put that in a function for convenience, e.g.:
function realLength(str) { // maybe there's a better name?
var el = document.createElement("div");
el.innerHTML = str;
return el.innerHTML.length;
}
Since there's no solution using jQuery yet:
var str = 'lol&';
alert($('<span />').html(str).text().length); // alerts 4
Uses the same approach like karim79, but it never adds the created element to the document.
You could for most purposes assume that an ampersand followed by letters, or a possible '#' and numbers, followed by a semicolon, is one character.
var strlen=string.replace(/&#?[a-zA-Z0-9]+;/g,' ').length;
If you are running the javascript in a browser I would suggest using it to help you. You can create an element and set its innerHTML to be your string containing HTML-entities. Then extract the contents of that element you just created as text.
Here is an example (uses Mootools): http://jsfiddle.net/mqchen/H73EV/
Unfortunately, JavaScript does not natively support encoding or decoding of HTML entities, which is what you will need to do to get the 'real' string length. I was able to find this third-party library which is able to decode and encode HTML entities and it appears to work well enough, but there's no guaranteeing how complete it will be.
Using ES6 (introduces codePointAt()
:
function strlen (str) {
let sl = str.length
let chars = sl
for (i = 0; i < sl; i++) if (str.codePointAt(i) > 65535) {
chars--;
i++;
}
return chars
}
Beware charCodeAt()
does not work the same way.