I know this has been asked many times, but I'm having a hard time understanding the difference in collation and encoding.
OK, so we have a MySQL database (several actually) that have tables with mixed collation. Now, some columns are utf8_unicode_ci
and some are the default of latin1_swedish_ci
. We have since used the utf8_unicode_ci
as the default. But we have a lot of "legacy" columns in the latin1.
We like consistency and would like to convert all columns to the utf8. Fortunately, 99.9% of our data is in English with the (literally) couple of records that have Spanish addresses (tildes). So pretty simple stuff for us but we want to do it the "right way" and fully support any language. Hence, the utf8_unicode.
We would rather not have to go through every column in every table and make that change. Is there a simple script that would covert every column to utf8_unicode_ci
?
Also, what are some pitfalls we need to look for? I mean, if I'm wrong and there are records in Russian or Japanese that I'm not aware of then will I have problems?
And maybe this should be a side question, but for encoding, UTF8 is better because it can store any language, correct? As in, each character would have 2 or 4 bytes that represent what it should be in a "global" alphabet of thousands of characters? Sorry if that analogy sounds stupid but trying to understand. And collation is just how they are sorted when I query. So that if the characters were stored like A8 9F
for the letter H
and B3 2E
for the letter e
then H
would come first because A8
comes before B3
?
Ha, my head is spinning now. Sorry if this seems to be three different questions but when I go to management and say I need to convert columns, this is why....
Thanks