1

It seems that IIS incorrectly delivers the request URL to a web application if the URL contains UTF-8 encoded characters, which are not supported by the current system locale. All "unsupported" characters are replaced by question marks ('?').

Example: The system locale is set to Norwegian. The following URL works fine:

/myapp/Blåbærsyltetøy/

The following URL does not work:

/myapp/черничный-джем/

In both URLs, non-ASCII characters are encoded as UTF-8 and then percent-encoded, so the actual URLs look like this:

/myapp/Bl%C3%A5b%C3%A6rsyltet%C3%B8y/
/myapp/%D1%87%D0%B5%D1%80%D0%BD%D0%B8%D1%87%D0%BD%D1%8B%D0%B9-%D0%B4%D0%B6%D0%B5%D0%BC/

The application uses two ways of handling requests:

  • wfastcgi + Python
  • ISAPI + C++

Both are suffering from the same problem, and both have no problem if the URL only contains characters that are supported by the system locale.

In the case of ISAPI, it looks like EXTENSION_CONTROL_BLOCK::lpszPathInfo already delivers a percent-decoded URL, where all "unsupported" characters have been replaced by question marks. The EXTENSION_CONTROL_BLOCK::lpszPathInfo attribute is a multi-byte character string, and there is no wide-character string version of this structure.

Is there a way to get the original, percent-encoded URL or prevent IIS from decoding URLs to work around the problem?

Florian Winter
  • 4,750
  • 1
  • 44
  • 69
  • For ISAPI, the solution is to get the URL from the server variable `HTTP_URL`, rather than `PATH_INFO`. This delivers the raw, percent-encoded URL, which then can be decoded correctly. In a wfastcgi script `HTTP_URL` is not available, and trying to access it in Python results in `KeyError`. – Florian Winter Oct 19 '17 at 10:26
  • Tried this workaround for wfastcgi: https://support.microsoft.com/en-us/help/2277918/fix-a-php-application-that-depends-on-the-request-uri-server-variable - Result: URLs no longer contain question marks. Instead, they contain percent-encoded bytes that become gibberish when interpreted as UTF-8. – Florian Winter Oct 19 '17 at 10:50
  • Correction to my previous comment: The hotfix and registry variable described here https://support.microsoft.com/en-us/help/2277918/fix-a-php-application-that-depends-on-the-request-uri-server-variable actually solves the problem for wfastcgi. – Florian Winter Oct 19 '17 at 11:00

1 Answers1

0

Solution for ISAPI

Get the request URL from the server variable HTTP_URL rather than PATH_INFO. This delivers the original, percent-encoded URL, which can then be decoded correctly (by percent-decoding to an array of bytes and interpreting that array of bytes as an UTF-8-encoded string).

This variable contains the query string and the original path before URL rewriting, which may be unwanted, so it may need some extra processing.

Also, for error handler requests, this variable contains a string in a format similar to

<DLL_PATH>?<STATUS_CODE>;<ORIGINAL_HTTP_URL>

which needs to be parsed. But it contains all the information that PATH_INFO contains, except without incorrect decoding.

Note: Getting Path_INFO using GetServerVariable, rather than from the EXTENSION_CONTROL_BLOCK structure does not solve the encoding problem.

Solution for wfastcgi

Server variables are encoded using the system locale (called 'mbcs' in Python) by default. This behavior can be changed by setting a registry key:

reg add HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\w3svc\Parameters /v FastCGIUtf8ServerVariables /t REG_MULTI_SZ /d REQUEST_URI\0PATH_INFO

Note that this will affect all wfastcgi applications on the same server and may break existing applications which do not expect variables to be UTF-8-encoded (rather unlikely, as any sane application that uses non-ASCII URLs would use UTF-8 encoding...).

See also https://support.microsoft.com/en-us/help/2277918/fix-a-php-application-that-depends-on-the-request-uri-server-variable

Florian Winter
  • 4,750
  • 1
  • 44
  • 69