1

We are migrating a classic ASP application, using SQL Server, to multilanguage, and for this, as first step we have to migrate from ansi / windows-1252 charset to Unicode.

We have succeed moving the ASP programs (saving then in UTF-8 with BOM do the trick), but now we face the SQL Server issues.

We have converted all columns from CHAR to NCHAR and from VARCHAR to NVARCHAR.

Our problem appears in dynamic SQL statements (I know, I know, we'll remove it in future).

When we use Unicode string literal, SQL Server manual requires to use N'MyUnicode' format (prefix with uppercase N the literal). But this is a nightmare for us (we have to look for code, searching for literals, but only in SQL statements.

Is there any way to request SQL to consider any literal as Unicode?

Our test code: 1 and 3 works, 2 fails

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-Language" content="en" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>

<% 
  ' Respose charset UTF-8 is equivalent to "Content-Type" content="text/html; charset=UTF-8"
  Response.Charset="UTF-8"
  Response.LCID=1033
  Session.LCID=1033
  Response.Write "Testing charset UTF-8 with BOM: " & "áéΔ" & "<br/>"

  Set Conn = Server.CreateObject("ADODB.Connection")
  Conn.Open "Provider=SQLOLEDB; Database=testUnicode;User Id=test;Password=xxxxxxxx"    

  SQLStmt = "select * from test"
  Set rs = Conn.execute(SQLStmt)
  If (rs.EOF) then
    Response.Write "Select (all records) failed. No record return.<br/>"
  Else
    Response.Write "Select (all records) succeed. The table contains:<br/>"
    Do while not rs.EOF
        Response.Write rs.fields("HTML") & " = " & Server.HTMLEncode(rs.fields("UNICODE")) & " Value=" & AscW(rs.fields("UNICODE")) & "<br/>"
        valueToSearch=rs.fields("UNICODE")
        rs.MoveNext
    Loop
    rs.Close
    Response.Write "<br/><br/>"
    Response.Write "Checking for specific record<br/>"

    ' This works!!
    valueToSearch="Δ"
    Response.Write "Checking for value = " & AscW(valueToSearch) & "<br/>"
    SQLStmt = "select * from test where UNICODE='" & valueToSearch & "'"
    Set rs = Conn.execute(SQLStmt)
    If (rs.EOF) then
        Response.Write "Select (one record " & Server.HTMLEncode(valueToSearch) & ") failed. No record return. Select=" & SQLStmt & "<br/>"
    Else
        Response.Write "Select (one record " & Server.HTMLEncode(valueToSearch) & ") succeed.<br/>"
        rs.MoveFirst
        Response.Write rs.fields("HTML") +" = "+Server.HTMLEncode(rs.fields("UNICODE"))+"<br/>"
        Response.Write "TESTBIT = " + Server.HTMLEncode(rs.fields("TESTBIT"))+"<br/>"
        Response.Write "TEXTBIT TEXTO = " + trim(rs.fields("TESTBIT")) +"<br/>"        
    End if

    Response.Write "<br/>"


    ' This fail!
    valueToSearch="é"
    Response.Write "Checking for value = " & AscW(valueToSearch) & "<br/>"
    SQLStmt = "select * from test where UNICODE='" & valueToSearch & "'"
    Set rs = Conn.execute(SQLStmt)
    If (rs.EOF) then
        Response.Write "Select (one record " & Server.HTMLEncode(valueToSearch) & ") failed. No record return. Select=" & SQLStmt & "<br/>"
    Else
        Response.Write "Select (one record " & Server.HTMLEncode(valueToSearch) & ") succeed.<br/>"
        rs.MoveFirst
        Response.Write rs.fields("HTML") +" = "+Server.HTMLEncode(rs.fields("UNICODE"))+"<br/>"
        Response.Write "TESTBIT = " + Server.HTMLEncode(rs.fields("TESTBIT"))+"<br/>"
        Response.Write "TEXTBIT TEXTO = " + trim(rs.fields("TESTBIT")) +"<br/>"
    End if

    Response.Write "<br/>"

    ' This works!
    valueToSearch="é"
    Response.Write "Checking for value = " & AscW(valueToSearch) & "<br/>"
    SQLStmt = "select * from test where UNICODE=N'" & valueToSearch & "'"
    Set rs = Conn.execute(SQLStmt)
    If (rs.EOF) then
        Response.Write "Select (one record " & Server.HTMLEncode(valueToSearch) & ") failed. No record return. Select=" & SQLStmt & "<br/>"
    Else
        Response.Write "Select (one record " & Server.HTMLEncode(valueToSearch) & ") succeed.<br/>"
        rs.MoveFirst
        Response.Write rs.fields("HTML") +" = "+Server.HTMLEncode(rs.fields("UNICODE"))+"<br/>"
        Response.Write "TESTBIT = " + Server.HTMLEncode(rs.fields("TESTBIT"))+"<br/>"
        Response.Write "TEXTBIT TEXTO = " + trim(rs.fields("TESTBIT")) +"<br/>"
    End if


  End if


  Conn.Close
%>
</body>

Our test table is like this

CREATE TABLE [dbo].[test]
(
    [HTML] [NVARCHAR](50) NULL,
    [UNICODE] [NVARCHAR](50) NULL,
    [TESTBIT] BIT
) ON [PRIMARY]
GO

INSERT [dbo].[test] ([HTML], [UNICODE], TESTBIT) 
VALUES (N'&aacute;', N'á', 1), (N'&eacute;', N'é', 1),
       /* Greek Delta Δ */
       (N'&#916;', N'Δ',0);
GO
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sourcerer
  • 1,891
  • 1
  • 19
  • 32
  • 1
    No, you cannot. It's embedded deep in the *parser*. By the time any more complex mechanism kicks in, the damage is already done. Most of that code is *ancient*. And by now, decades of history says that `''` is interpreted by collation/code page rules and `N''` is the way to play by Unicode rules. Anything setting that shifted the rules from under that much existing code would be considered dangerous. – Damien_The_Unbeliever Aug 24 '18 at 17:27
  • Thanks! This is incredible for me. A 2017 database claiming being SQL-92 compliant.... Really thanks! – Sourcerer Aug 24 '18 at 23:41
  • @Sourcerer what do you expect? If you have data already not stored as unicode, how do you expect SQL Server to convert it to unicode for you without a manual process? The fact it’s SQL-92 compliance has nothing to do with what decision was made when designing the database to go with non-unicode datatypes. – user692942 Aug 25 '18 at 13:41
  • The data has already converted fine (we have created the destination database with NCHAR/NVARCHAR and executed an INSERT AS SELECT from original). Our problem is only with legacy code (and my complain with SQL-92 is about language standards). But, let me explain, it's only a cry from an old programmer (I have suffered that kinds of things in Host DB2, in Oracle (when they added UTF8 support, and, 20 year later, now with a really old code) – Sourcerer Aug 25 '18 at 15:19
  • You seem to have missed out `Response.CodePage = 65001`, which is often very useful when you want to get utf-8 from a database – John Aug 26 '18 at 14:37
  • Also tested, but Codepage=65001 it's only relevant for how the response is built (and saving the page with BOM do the same trick without altering the code) – Sourcerer Aug 26 '18 at 16:41

0 Answers0