-1

I have a string that contains 253 characters, but takes up 261 bytes due to some of the characters being longer than one byte (in this case, it contains several zero-width spaces).

I want to insert this string into a VARCHAR(255) column in a utf8mb4 table, but it throws this error:

INSERT INTO mytable(mycolumn) VALUES("This string contains 253 ​characters which should ​fit inside of a varchar(255) column, but because ​of the zero width spaces, it takes up more than 255 bytes and therefore throws an error when you try to insert it into the table and that makes me ​sad.")

/*Error 1406: Data too long for column*/

Here's an example of what the two length functions output:

/* Returns 253 */
SELECT CHAR_LENGTH("This string contains 253 ​characters which should ​fit inside of a varchar(255) column, but because ​of the zero width spaces, it takes up more than 255 bytes and therefore throws an error when you try to insert it into the table and that makes me ​sad.");

/* Returns 261 */
SELECT LENGTH("This string contains 253 ​characters which should ​fit inside of a varchar(255) column, but because ​of the zero width spaces, it takes up more than 255 bytes and therefore throws an error when you try to insert it into the table and that makes me ​sad.");

To fix that, I need to trim off just enough characters for the string to be within the byte limit.

Unfortunately, the LEFT() function doesn't appear to be suitable for this:

/* Still returns 261 because it counts characters, not bytes */
SELECT LENGTH(LEFT("This string contains 253 ​characters which should ​fit inside of a varchar(255) column, but because ​of the zero width spaces, it takes up more than 255 bytes and therefore throws an error when you try to insert it into the table and that makes me ​sad.", 255));

So, how can I trim the string down?

Pikamander2
  • 7,332
  • 3
  • 48
  • 69
  • Does this answer your question? [Replace NonASCII Characters in MYSQL](https://stackoverflow.com/questions/20766939/replace-nonascii-characters-in-mysql) – P.Salmon Dec 30 '20 at 10:27
  • @P.Salmon - No, that's about replacing characters. I want to keep the zero-width characters, but trim off the end of the string until it's an acceptable length. – Pikamander2 Dec 30 '20 at 10:30
  • Check this: https://stackoverflow.com/questions/51517927/mysql-select-first-10-bytes-of-a-string – forpas Dec 30 '20 at 10:53

1 Answers1

0

One workaround is to process the string ahead of time using another language.

For example, here's a working solution in Python:

def trim_string_to_x_bytes(string, max_bytes, character_encoding):
    string = string[:max_bytes]
    
    while (len(string.encode(character_encoding)) > max_bytes):
        string = string[:-1]
    
    return string

long_string = "This string contains 253 ​characters which should ​fit inside of a varchar(255) column, but because ​of the zero width spaces, it takes up more than 255 bytes and therefore throws an error when you try to insert it into the table and that makes me ​sad."

long_string = trim_string_to_x_bytes(long_string, 255, 'utf8')

cursor.execute("INSERT INTO mytable(mycolumn) VALUES(%s)", (long_string,))

That said, a pure MySQL solution would be better if one exists.

Pikamander2
  • 7,332
  • 3
  • 48
  • 69