0

In 2019, Microsoft added UTF-8 for SQL Server: https://techcommunity.microsoft.com/t5/sql-server/introducing-utf-8-support-for-sql-server/ba-p/734928

Previously dropped characters are suddenly supported. But UTF-8 takes more space.

Deploying the new dacpac causes the database to rebuild for a few hours. We cannot afford this on our production server. In the other hand, we would still like UTF-8 support.

I was wondering whether there is a workaround to this problem. Maybe turn on IgnoreColumnCollation in the parameters of SqlPackage? Will UTF-8 still be supported if we do this?

If not, can we do what the documentation states?

To convert it in-place, run an alter statement, like the following example:

ALTER TABLE dbo.MyTable
ALTER COLUMN MyString VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8

It's very easy to implement, however this is a possibly blocking operation which may pose an issue for > large tables and busy applications.

this is a possibly blocking operation

Is it? How do we know what columns need to be changed?

Wouter
  • 154
  • 17
  • *"Previously dropped characters are suddenly supported."* This isn't true, you just had to ensure you used an `nvarchar`. – Thom A Aug 21 '20 at 13:18
  • 2
    UTF-8 collations are a fairly new application. Per Larnu, the regular Unicode types will typically continue to serve you well. It is definitely not the case UTF-8 collations are necessary to support anything new; they just offer potentially better storage and performance in particular scenarios. They should not (yet) be the first thing you go to for storing Unicode data. Regardless of whether you switch to Unicode types or change the collation, either approach requires rebuilding entire tables (because data must be converted), which is only free if they're empty. – Jeroen Mostert Aug 21 '20 at 13:21
  • Thanks! But we can change the collation beforehand? Is this for all VARCHAR columns? – Wouter Aug 24 '20 at 08:59
  • The collation of columns is a per-column setting that defaults to the database default if not specified on table creation. In a database project, this is a setting in the properties of the project. Of course, changing the collation this way requires redeploying the entire DB. Collation applies to *all* text columns, including `NVARCHAR`, but only for `VARCHAR` does it determine the encoding as well as the sort and comparison. For `NVARCHAR` it determines character ordering, but not what can be stored, as those fields always use Unicode. – Jeroen Mostert Aug 24 '20 at 10:26

0 Answers0