SQL Server Convert ASCII HTML to Unicode Character - Mass Convert/String Search

Question

I have data dump that was originally in an Azure database and has since been migrated to a table on SQL Server. For the most part, the data migration went smoothly. There were, however, some issues with some of the string data. I do not know what the source settings were for the database, table or column(s).

Issue:

String data, such as Last Name, came across with mixed Unicode and non-Unicode (I believe I have that correct) data. Some examples are:

Petrėnaitĩ

Černikova

Zalāne

There are a number of different "codings" being mixed into the strings since this is happening in both the First and Last Name columns, City columns, and Address columns. All columns are nvarchar and have the collation SQL_Latin1_General_CP1_CI_AS in the current database where I am seeing this issue.

Request/Question:

Is there any way to easily update all of these records (roughly, 2000) to convert all the "codings" into what their "human-friendly" format should be?

Updates

I have tried to gain access to the Azure database, but even with access the values are passed from the web to the database with the Unicode values converted to ASCII, and then the web converts them back upon extraction.

I can not strip these values out because the names would not be clean/complete.

A REGEX Search & Replace would be difficult due to the number of ASCII values.

You could probably use methods on the `XML` datatype to do this but I'd probably use CLR to do it. Related question http://stackoverflow.com/questions/5783817/convert-character-entities-to-their-unicode-equivalents — Martin Smith, Oct 23 '15 at 21:18

SQL Server Convert ASCII HTML to Unicode Character - Mass Convert/String Search

0 Answers0