3

I want to count the number of characters in a textfield on my website. The textfield accepts any type of input from a user, including ascii art and other special characters. If the user types in normal characters, I can use strlen($message) to return the value, but if the user uses special characters (such as  or ©), the count is incorrect.

Is there are simple way to count everything without having to do any heavy lifting?

user1399181
  • 303
  • 4
  • 10
  • 6
    http://php.net/mb_strlen for multi-byte character strings – Marc B Apr 09 '14 at 17:12
  • If you want to know how long a string will be on the screen, that's tough. `mb_strlen()` will only count "characters", but there are characters that don't display, characters that modify preceding characters (in Unicode, at least), etc. – Walter Tross Apr 09 '14 at 17:16

3 Answers3

10

If your input is UTF-8 encoded and you want to count Unicode graphemes, you can do this:

$count = preg_match_all('/\X/u', $text);

Here is some explanation. Unicode graphemes are "characters" (Unicode codepoints), including the "combining marks" that can follow them.

mb_strlen($text, 'UTF-8') would count combining marks as separate characters (and strlen($text) would give you the total bytecount).

Since, judging by a comment of yours, your input could have some characters converted to their HTML entity equivalent, you should first do an html_entity_decode():

$count = preg_match_all('/\X/u', html_entity_decode($text, ENT_QUOTES, 'UTF-8'));

UPDATE

The intl PECL extension now provides grapheme_strlen() and other grapheme_*() functions (but only if you have the intl PECL extension installed, of course).

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
0

Both strlen & mb_strlen are working fine for me.

Either the special characters entered may wont show (Unicode). So try which are the characters that are not readable.

Hope this helps you.

  • mb_stren does work for the examples I gave. But if someone types less-than sign "<" it gets converted to < by wordpress and counts as 4 characters. Since I believe that is the only character that gets converted by wordpress, I'm thinking I'd be able to count the number of less-than signs (n), and subtract 4n from the total character count. – user1399181 Apr 09 '14 at 17:59
0

Here you go.

function countumlauts($str) {
    return strlen($str) - iconv_strlen($str);
}

How it works: Special chars use more than one byte. strlen counts the bytes, while iconv_strlen counts the chars.

ALZlper
  • 171
  • 1
  • 7