228

What's the main difference between length() and char_length()?

I believe it has something to do with binary and non-binary strings. Is there any practical reason to store strings as binary?

mysql> select length('MySQL'), char_length('MySQL');
+-----------------+----------------------+
| length('MySQL') | char_length('MySQL') |
+-----------------+----------------------+
|               5 |                    5 |
+-----------------+----------------------+
1 row in set (0.01 sec)
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Yada
  • 30,349
  • 24
  • 103
  • 144
  • 2
    Yes, there is practical reasons to store binary strings when they actually are binary strings, e.g. compressed. – sanmai Nov 14 '09 at 14:27

3 Answers3

371

LENGTH() returns the length of the string measured in bytes.
CHAR_LENGTH() returns the length of the string measured in characters.

This is especially relevant for Unicode, in which most characters are encoded in two bytes. Or UTF-8, where the number of bytes varies. For example:

select length(_utf8 '€'), char_length(_utf8 '€')
--> 3, 1

As you can see the Euro sign occupies 3 bytes (it's encoded as 0xE282AC in UTF-8) even though it's only one character.

Andomar
  • 232,371
  • 49
  • 380
  • 404
  • 6
    Only UCS-2 is encoded in two bytes per character. This encoding (or more accurately UTF-16LE) is what Windows misleadingly calls “Unicode”. MySQL doesn't support UTF-16; instead the usual approach for putting Unicode strings in it is to use UTF-8. – bobince Nov 14 '09 at 14:20
  • 2
    For example: select length('日本語'), char_length('日本語'); – sanmai Nov 14 '09 at 14:22
  • @bobince: Even UCS-2 encodes some characters in more than 2 bytes, for example `0313 combining comma above`. Since a = 61, 0x00610313 displays as a̓, and it takes up 4 bytes. – Andomar Nov 14 '09 at 14:32
  • 2
    Actually by Unicode terminology that's still 2 characters, even though like all combining marks it can — if a suitable font is available — be rendered as a single glyph. UTF-16LE can still have a 4-byte character though thanks to the surrogates. – bobince Nov 14 '09 at 22:05
  • 8
    So which of these functions should I use when figuring out how to resize my VARCHAR columns? When creating the table lets say one column is a VARCHAR(10). Will that allow a maximum of 10 characters, or a maximum of 10 bytes? – still_dreaming_1 Dec 17 '14 at 04:41
  • UCS-2 is NOT UTF-16LE. UCS-2 is 2 bytes per character, but UTF-16LE is Little Endian and 2 or 4 bytes per character (surrogate pairs to represent characters >FFFF (2 surrogates = 1 character)) – Rahly Aug 10 '17 at 17:11
20

varchar(10) will store 10 characters, which may be more than 10 bytes. In indexes, it will allocate the maximium length of the field - so if you are using UTF8-mb4, it will allocate 40 bytes for the 10 character field.

Brent
  • 209
  • 2
  • 2
0

Though here answer provided by @Andomar is correct, I am tempted to provide a more detailed answer.

NAME IN ENGLISH -

Suppose I create a variable @name storing my name -

SET @name = "Payel Senapati";

Now, I create a variable total_characters to store the number of characters my name occupy -

SET @total_characters =  CHAR_LENGTH(@name);
SELECT @total_characters;
+-------------------+
| @total_characters |
+-------------------+
|                14 |
+-------------------+

Case 1:

I create a variable @test to store @name converted to latin1 character set -

SET @test = CONVERT(@name USING latin1);

I create a variable @total_bytes and store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           14 |
+--------------+

Now, latin1 character set allocated 1 byte per character.

Thus, @total_characters = @total_bytes


Case 2:

Now, in variable @test I store @name converted to ucs2 character set -

SET @test = CONVERT(@name USING ucs2);

Now, in variable @total_bytes I store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           28 |
+--------------+

Now, ucs2 character set allocated 2 bytes per character.

Thus, 2 * @total_characters = @total_bytes


NAME IN HINDI -

Now, I store my name in variable @name in Hindi -

SET @name = "पायल सेनापति";

Now, in variable total_characters I store the number of characters my name occupy in Hindi -

SET @total_characters =  CHAR_LENGTH(@name);
SELECT @total_characters;
+-------------------+
| @total_characters |
+-------------------+
|                14 |
+-------------------+

Case 1:

Now, in variable @test I store @name converted to ucs2 character set -

SET @test = CONVERT(@name USING ucs2);

Now, in variable @total_bytes I store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           28 |
+--------------+

Now, ucs2 character set allocated 2 bytes per character.

Thus, 2 * @total_characters = @total_bytes


Case 2:

Now, in variable @test I store @name converted to utf32 character set -

SET @test = CONVERT(@name USING utf32);

Now, in variable @total_bytes I store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           56 |
+--------------+

Now, utf32 character set allocates 4 bytes per character.

Thus, 4 * @total_characters = @total_bytes


To see all character sets supported by MySQL use -

SHOW CHARACTER SET;
Payel Senapati
  • 1,134
  • 1
  • 11
  • 27