We have a jdbc program which moves data from one database to another.
- Source database is using Latin9 character set
- Destination database uses UTF-8 encoding and the size of a column is specified in bytes instead of characters
- We have converted ddl scripts of source database to equivalent script in destination database keeping the size of the column as-is.
- In some cases, if there are some special characters, the size of the data after converting to UTF-8 is exceeding the size of the column in destination database causing the jdbc program to fail.
I understand that UTF-8 is variable-width encoding scheme which can take 1-4 bytes per character, given this the worst case solution would be to allocate 4 times the size of a column in destination database.
Is there a better estimate?