1

From the browser I'm trying to access file https://example.com/86/86454cff-556a-4162-aa65-433158c133f4/Informacja+kwartalna++III+kwartał+2016+r. and I'm getting 404 error.

When I check filesystem the file exist with encoded chars: 86/86454cff-556a-4162-aa65-433158c133f4/Informacja+kwartalna++III+kwarta%C5%82+2016+r.

When I've turned the listings on Apache serves this file at https://example.com/86/86454cff-556a-4162-aa65-433158c133f4/Informacja+kwartalna++III+kwarta%25C5%2582+2016+r.

How can I make it work so it's properly served with the first address? Or should I save the file on filesystem using utf-8 chars without encoding? What would you suggest?

Kangur
  • 241
  • 3
  • 6
  • There are a lot of variables here, not the least being things like filesystem / VFS. You might try doing things like using strace on the httpd processes to see what it's actually looking for, and you can do a (selective) LogLevel DEBUG inside a (or LocationMatch may be easier to specific) too see if that gives you any clues. – Cameron Kerr May 28 '17 at 23:27
  • 1
    URL (paths, at least -- host portions having IDNA available) can only be sent over the Internet using the ASCII character set (see https://www.w3schools.com/html/html_urlencode.asp), so if a URL is presented to a browser that contains something outside of that, then the browser/client will escape the UTF-8 encoded byte sequence of that character, so ł will be changed to %C5%82. If a literal non-ASCII character were to be presented to a server (reverse proxies, etc.) I would expect varying drops/errors. So I think your question really boils down to if you can use literals on FS objects. – Cameron Kerr May 28 '17 at 23:56

2 Answers2

3

The filename shouldn't have any URL escaping in your filesystem. URL encoding is only relevant during the HTTP request from the client to server, because in that part you can only use ASCII characters.

When Apache or any other web server receives the request, it first decodes the URL encoding, and after that looks up for the filename.

Of course there are several ways to encode Unicode characters. However, the filename encoding is done by system libraries / filesystem, and is the same for all applications. Therefore, you need to only check that the filename is the same in the directory as you want to use in browser URL.

If there is a CMS involved, its implementation has to properly support Unicode filenames. Unfortunately the support for Unicode filenames in CMS' is often bad, and you need to restrict yourself to ASCII filenames for things to work properly.

Tero Kilkanen
  • 36,796
  • 3
  • 41
  • 63
-2

I believe what you are looking for is .htaccess and symbolic links. I don't recall what the exact directives are.