7

Attempting to make my first ASP.NET page. Got IIS 5.1 on XP, configured to run .NET 4. Created a new virtual directory and added an .aspx file. When I browse the file, non-ASCII characters are corrupted. For instance, an ü (U+00FC) is transformed to ü (U+00C3 U+00BC), which is the I-don't-get-this-is-UTF-8 equivalent.

I have tried various ways of availing this:

  1. I made sure the .aspx file is indeed encoded as UTF-8.
  2. I set the meta tag:

    <meta charset="UTF-8">

  3. I set the virtual directory to handle .aspx as text/html;charset=utf-8 under HTTP Headers > File Type in IIS.

  4. I added ResponseEncoding="utf-8" to <%@ Page ... %>.
  5. I inserted the string in HttpUtility.HtmlEncoded(). Now the ü was transformed to ü (U+00C3 U+00BC).

Finally, I found 2 ways that worked:

  1. Replacing non-ASCII characters with character references, such as &#252; This was okay in the 90's, not today.
  2. Adding a web.config file to the virtual directory, with this content:

    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
      <system.web>
        <globalization fileEncoding="utf-8"/>
      </system.web>
    </configuration>
    

Without fileEncoding setting, the ASP.NET parser will read the .aspx and corrupt every non-ASCII character without attempting to infer the file encoding. Is this just something you pros have learned to live with, or am I missing something? Is a web.config file with globalization settings the way to handle "international" characters on .aspx pages? I don't remember having similar problems with PHP, so I'm puzzled why this crops up with ASP.NET.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
Gustaf Liljegren
  • 311
  • 4
  • 15
  • I found another way of making it work without the web.config file: Save the .aspx page as UTF-8 with byte-order-mark (BOM). In general, UTF-8 shouldn't need a BOM, since the byte-order is implicit in the encoding, but Microsoft have a tradition to require it, which is probably the right thing to do, since it makes inferring the file encoding more robust. I guess this is the kind of solution I was looking for, but comments are still welcome. – Gustaf Liljegren May 13 '12 at 14:28
  • You should consider installing Microsoft Web platform installer and using IIS express 7.5 and web matrix or VS 2010 express – Nikola Sivkov May 13 '12 at 17:28

2 Answers2

4

To use non-ASCII characters you need to have two things. Save the files using UTF-8, by choosing this encoding for the files and be sure that you have these settings on your web.config

<globalization requestEncoding="utf-8" responseEncoding="utf-8"  fileEncoding="utf-8" />

Note that there is always a web.config on ASP.NET. There is the global one that also has these settings and lives in the asp.net directory {drive:}\WINDOWS\Microsoft.NET\Framework\{version}\CONFIG\, and then the web.config on your project. Sometimes the global one sets the encoding from the current country. In this case you need to set it back to UTF-8 in your project.

You have found all that already, I just point out the 3 settings:

  1. Save your files with unicode.
  2. Set the requestEncoding="utf-8"
  3. Set the responseEncoding="utf-8"
Aristos
  • 66,005
  • 16
  • 114
  • 150
0

You have three options.

Option 1 - either entity-encode all characters that don't fit into ASCII or replace them with similarly looking ASCII equivalents. This is error-prone and hard to maintain. The next time you have to incorporate a large piece of text you may forget to check the included piece and it "looks garbage" again.

Option 2 - save the .aspx as "UTF-8 with BOM". Such files are properly handled automatically - that's documented in description of fileEncoding property of system.web/globalization section of web.config. This is also hard to maintain - the next time you get the file resaved as "UTF-8" (without BOM) it "looks garbage" again and it may go unnoticed. When you add new .aspx files you'll have to check they are saved as "UTF-8 with BOM" too. This approach is error prone - for example, some file comparison tools don't show adding/removing BOM (at least with default settings).

Option 3 - ensure the file is saved as either "UTF-8" or "UTF-8 with BOM" and at the same time set fileEncoding property of system.web/globalization section of web.config to utf-8. The default value of this property is "single byte character encoding" so files with non-ASCII character saved as UTF-8 are handled improperly and result "looks garbage". This is the most maintainable approach - it's easy to see and easy to verify and don't randomly break when a file is resaved. fileEncoding is the only one of the three ???Encoding properties which defaults to "single byte character encoding" - responseEncoding and requestEncoding default to utf-8 so in most cases there's no need to change (or set) them, setting fileEncoding is usually enough.

sharptooth
  • 167,383
  • 100
  • 513
  • 979