MySQL - length() vs char_length()

Question

What's the main difference between length() and char_length()?

I believe it has something to do with binary and non-binary strings. Is there any practical reason to store strings as binary?

mysql> select length('MySQL'), char_length('MySQL');
+-----------------+----------------------+
| length('MySQL') | char_length('MySQL') |
+-----------------+----------------------+
|               5 |                    5 |
+-----------------+----------------------+
1 row in set (0.01 sec)

Yes, there is practical reasons to store binary strings when they actually are binary strings, e.g. compressed. — sanmai, Nov 14 '09 at 14:27

Andomar · Accepted Answer · 2012-09-03T16:24:54.580

371

LENGTH() returns the length of the string measured in bytes.
CHAR_LENGTH() returns the length of the string measured in characters.

This is especially relevant for Unicode, in which most characters are encoded in two bytes. Or UTF-8, where the number of bytes varies. For example:

select length(_utf8 '€'), char_length(_utf8 '€')
--> 3, 1

As you can see the Euro sign occupies 3 bytes (it's encoded as 0xE282AC in UTF-8) even though it's only one character.

edited Sep 03 '12 at 16:24

answered Nov 14 '09 at 14:14

Andomar

232,371
49
380
404

6

Only UCS-2 is encoded in two bytes per character. This encoding (or more accurately UTF-16LE) is what Windows misleadingly calls “Unicode”. MySQL doesn't support UTF-16; instead the usual approach for putting Unicode strings in it is to use UTF-8. – bobince Nov 14 '09 at 14:20
2

For example: select length('日本語'), char_length('日本語'); – sanmai Nov 14 '09 at 14:22
@bobince: Even UCS-2 encodes some characters in more than 2 bytes, for example `0313 combining comma above`. Since a = 61, 0x00610313 displays as a̓, and it takes up 4 bytes. – Andomar Nov 14 '09 at 14:32
2

Actually by Unicode terminology that's still 2 characters, even though like all combining marks it can — if a suitable font is available — be rendered as a single glyph. UTF-16LE can still have a 4-byte character though thanks to the surrogates. – bobince Nov 14 '09 at 22:05
8

So which of these functions should I use when figuring out how to resize my VARCHAR columns? When creating the table lets say one column is a VARCHAR(10). Will that allow a maximum of 10 characters, or a maximum of 10 bytes? – still_dreaming_1 Dec 17 '14 at 04:41
UCS-2 is NOT UTF-16LE. UCS-2 is 2 bytes per character, but UTF-16LE is Little Endian and 2 or 4 bytes per character (surrogate pairs to represent characters >FFFF (2 surrogates = 1 character)) – Rahly Aug 10 '17 at 17:11

score 20 · Answer 2 · answered Apr 21 '15 at 20:01

20

varchar(10) will store 10 characters, which may be more than 10 bytes. In indexes, it will allocate the maximium length of the field - so if you are using UTF8-mb4, it will allocate 40 bytes for the 10 character field.

answered Apr 21 '15 at 20:01

Brent

209
2
2

How is this related to the question? – Payel Senapati Aug 29 '21 at 05:41

score 0 · Answer 3 · answered Sep 29 '21 at 19:08

Though here answer provided by @Andomar is correct, I am tempted to provide a more detailed answer.

NAME IN ENGLISH -

Suppose I create a variable @name storing my name -

SET @name = "Payel Senapati";

Now, I create a variable total_characters to store the number of characters my name occupy -

SET @total_characters =  CHAR_LENGTH(@name);
SELECT @total_characters;
+-------------------+
| @total_characters |
+-------------------+
|                14 |
+-------------------+

Case 1:

I create a variable @test to store @name converted to latin1 character set -

SET @test = CONVERT(@name USING latin1);

I create a variable @total_bytes and store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           14 |
+--------------+

Now, latin1 character set allocated 1 byte per character.

Thus, @total_characters = @total_bytes

Case 2:

Now, in variable @test I store @name converted to ucs2 character set -

SET @test = CONVERT(@name USING ucs2);

Now, in variable @total_bytes I store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           28 |
+--------------+

Now, ucs2 character set allocated 2 bytes per character.

Thus, 2 * @total_characters = @total_bytes

NAME IN HINDI -

Now, I store my name in variable @name in Hindi -

SET @name = "पायल सेनापति";

Now, in variable total_characters I store the number of characters my name occupy in Hindi -

SET @total_characters =  CHAR_LENGTH(@name);
SELECT @total_characters;
+-------------------+
| @total_characters |
+-------------------+
|                14 |
+-------------------+

Case 1:

Now, in variable @test I store @name converted to ucs2 character set -

SET @test = CONVERT(@name USING ucs2);

Now, in variable @total_bytes I store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           28 |
+--------------+

Now, ucs2 character set allocated 2 bytes per character.

Thus, 2 * @total_characters = @total_bytes

Case 2:

Now, in variable @test I store @name converted to utf32 character set -

SET @test = CONVERT(@name USING utf32);

Now, in variable @total_bytes I store the length of @test in terms of bytes -

SET @total_bytes = LENGTH(@test);
SELECT @total_bytes;
+--------------+
| @total_bytes |
+--------------+
|           56 |
+--------------+

Now, utf32 character set allocates 4 bytes per character.

Thus, 4 * @total_characters = @total_bytes

To see all character sets supported by MySQL use -

SHOW CHARACTER SET;

MySQL - length() vs char_length()

3 Answers3

Linked