7

I have this url:

http://domain.com/wp-content/uploads/2012/10/Hvilke-vilkår-følger-med-når-du-bestiller-nyt-bredbånd.png

If I ftp/ssh or just browse to that folder (apache index feature), I see the file

Hvilke-vilkår-følger-med-når-du-bestiller-nyt-bredbånd.png

If I click on the link from the apache index, I can see the file, however, if I copy the URL and try to browse to it directly, I get the error:

The requested URL /wp-content/uploads/2012/10/Hvilke-vilkår-følger-med-når-du-bestiller-nyt-bredbånd.png was not found on this server.

Also my error log says:

File does not exist: /wp-content/uploads/2012/10/Hvilke-vilk\xc3\xa5r-f\xc3\xb8lger-med-n\xc3\xa5r-du-bestiller-nyt-bredb\xc3\xa5nd.png

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209

2 Answers2

2

You probably need to normalize the encoding of the filenames to Unicode NFC form. See the related StackOverflow question 12643402. One tool you could use is convmv, which should be available in CentOS.

200_success
  • 4,771
  • 1
  • 25
  • 42
  • After reading the related stackoverflow question I wanted to slap myself. I just ran convmv on the folder and now all images are loaded fine. Thanks a lot! –  Nov 15 '12 at 23:01
0

RFC 3986 §2.5 recommends that non-ASCII characters first be represented in the UTF-8 character encoding, then each byte of that encoding should be percent-encoded. However, the older RFC 2396 does not recommend any particular character encoding. Therefore, the browser's behaviour when you enter an address containing non-ASCII characters into the address bar is implementation dependent, particularly if you are using an older browser. For example, Internet Explorer 7 on Windows and Safari 6 on OS X does not exhibit the problem you described.

Your website appears to be running on the LiteSpeed Web Server, a proprietary clone of Apache. When LiteSpeed generates URLs of the links in the directory index, it is not percent-encoding the non-ASCII characters in the filenames, leading to the ambiguous situation described above. In contrast, Apache 2.2.16 does percent-encode the UTF-8 representation of the filenames, so your problem would not occur on Apache. Perhaps a newer version of LiteSpeed could solve your problem. It's also possible that specifying IndexOptions Charset=UTF-8 could help. Since LiteSpeed is proprietary, I can't really help you; you'll have to contact their technical support. Judging from the fact that your server emits <A HREF="..." instead of <a href="...", I would guess that LiteSpeed's directory-index-generating code is not based on any recent version of Apache.

200_success
  • 4,771
  • 1
  • 25
  • 42
  • The server is apache, but I am migrating from a LiteSpeed web server, if that info helps. –  Nov 15 '12 at 19:57
  • The link generated from the directory index is: [ DOMAION ]/wp-content/uploads/2012/11/Hvilke-former-for-tra%cc%8adl%c3%b8st-bredba%cc%8and-findes-der1.png –  Nov 15 '12 at 19:57
  • If I see the source code, it appears as: `/wp-content/uploads/2012/11/Hvilke-former-for-trådløst-bredbånd-findes-der1.png` This is the link if I right-click->get link url: `/wp-content/uploads/2012/11/Hvilke-former-for-tr%C3%A5dl%C3%B8st-bredb%C3%A5nd-findes-der1.png` As you can see, they are different (link from html and link from apache index) –  Nov 15 '12 at 20:25
  • Fascinating. 'CC 8A' is the UTF-8 representation of Unicode U+030A (COMBINING RING ABOVE), so 'a%cc%8a' is a decomposed form of '%c3%a5'. http://en.wikipedia.org/wiki/Unicode_equivalence Why the filesystem, OS, or webserver would bother with Unicode equivalence transformations puzzles me, especially since it happens with the 'å' but not the 'ø'. Perhaps the filename is stored in decomposed form in the filesystem? Please tell us what OS, filesystem, and version of Apache you are using. – 200_success Nov 15 '12 at 21:43
  • Centos 6.3 64 bits, Apache/2.2.15 (Unix), filesystem: `mount /dev/md2 on / type ext4 (rw) proc on /proc type proc (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md1 on /boot type ext3 (rw) /dev/md3 on /home type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)` –  Nov 15 '12 at 22:00
  • Given the new information provided in these comments, I have started over with a new answer. – 200_success Nov 15 '12 at 22:28