0

I'm having a problem displaying images with greek filenames (eg 'φωτογραφία.jpg') in a browser. Using this script I found out which 2 encodings I need to use with iconv() so I can get the filename to display correctly in a browser. The image itself though fails to render.

<? 
$file = 'φωτογραφία.jpg';
$encodings = array("UTF-8", "ASCII", "Windows-1253", "ISO-8859-1", "UTF-16");
$iconv = "";
foreach ($encodings as $i) {
  foreach ($encodings as $j) {
    if($j!==$i) $iconv .= "<br /> $i -> $j: ".iconv($i, $j, $file);
  }
}
echo $iconv;
?>

Working link here, which retuns the correct filename when converting from UTF-8 -> Windows-1253.

The environment is PHP 5.2.17 on Apache/2.2.22 (Unix) and the files have been uploaded from a Windows machine. Currently, I've only tested 2-3 images by hardcoding them into a test PHP file. Do you think it would be different if the filenames were pulled from a database query?

dda
  • 6,030
  • 2
  • 25
  • 34
bikey77
  • 6,384
  • 20
  • 60
  • 86
  • What are you actually doing with the filenames? – Jon Jul 11 '12 at 10:09
  • They are product images, need to display them in an eshop. – bikey77 Jul 11 '12 at 10:22
  • What are you doing *code-wise*... :) – Jon Jul 11 '12 at 10:24
  • I'm simply trying to find the proper conversion to get them to show in a browser. If I do, then I will pass the filenames to a database so I can pull them from there. (am I missng the point of your question? I feel like I am!) – bikey77 Jul 11 '12 at 10:34

2 Answers2

1

URLs are not likely to work with literal multibyte characters in them. You need to pass them through urlencode() in order to get sensible results.

E.g.

$file = 'φωτογραφία.jpg';
echo '<p><a href="'.urlencode($file).'" target="_self"><img src="'.urlencode($file).'" width="100" height="100" border="1"></a></p>';

This produces HTML that looks something like:

<p><a href="%CF%86%CF%89%CF%84%CE%BF%CE%B3%CF%81%CE%B1%CF%86%CE%AF%CE%B1.jpg" target="_self"><img src="%CF%86%CF%89%CF%84%CE%BF%CE%B3%CF%81%CE%B1%CF%86%CE%AF%CE%B1.jpg" width="100" height="100" border="1"></a></p>
DaveRandom
  • 87,921
  • 11
  • 154
  • 174
  • I understand what you mean but it isn't working, returns ÏÏÏογÏαÏία.jpg – bikey77 Jul 11 '12 at 10:35
  • Sorry, how do you mean? Where do you see the �s? The above code should produce all ASCII characters, and should leave it up to the server to translate this to the correct binary sequences... – DaveRandom Jul 11 '12 at 10:36
  • @bikey77 can you show the full source code for that PHP script, including the line where you echo the `` tag? – DaveRandom Jul 11 '12 at 10:38
  • Sorry for the confusion, I edited my 1st comment. I have used your code, changed nothing at all. – bikey77 Jul 11 '12 at 10:40
  • That seems odd, when I view the source code of the file I see literal multibyte characters and not `%xx` sequences... Since Windows (the source of the files) [stores LFN in unicode](http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx) I would expect you to simply need to declare the document as UTF-8 and it would work. Try [this script](http://codepad.org/gQZIYwgV) and see what happens. – DaveRandom Jul 11 '12 at 10:49
  • Please have a look: http://www.ht-webcreations.com/test/glob.php If I change the encoding to iso-8859-7 it outputs the filenames correctly and I don't mind that encoding. I dont get it though, why is glob working correctly when the other code doesnt? – bikey77 Jul 11 '12 at 10:54
  • @bikey77 Well I see rendered images... I don't see correct characters though. How does the text output look to you? (your original code didn't produce the correct text output for me either but I suspect that's because my browser is not set up right on his machine) – DaveRandom Jul 11 '12 at 10:56
  • You are serving your page as UTF-8, but the characters inside it are encoded as cp1253. If those strings are coming literally from inside the PHP, you need to set your editor to save in UTF-8 instead of cp1253 (the default Windows encoding for your locale). – bobince Jul 11 '12 at 21:27
0

The environment is PHP 5.2.17 on Apache/2.2.22 (Unix) and the files have been uploaded from a windows machine.

Ah, but with what encoding did you upload them? Because WinNT filenames are native-unicode, and Unix filenames are native-bytes, the file upload process has to pick an encoding to convert between them.

The majority of Linux boxes interpret their filenames as UTF-8 when displayed in the shell or local desktop, so that's a reasonable choice, and also IRIs are always UTF-8, so if you want the filename to appear as φωτογραφία.jpg in the browser address bar, that's the encoding you want to go for. In that case your URI-encoded version would be %cf%86%cf%89%cf%84%ce%bf%ce%b3%cf%81%ce%b1%cf%86%ce%af%ce%b1.jpg.

However some Windows tools will instead default to the 'ANSI code page', a locale-specific encoding. So if you used such a tool on a Greek version of Windows, you would get cp1253; if you used it on a Western European installation you'd get cp1252 and it would break because Greek letters are not available in that encoding. If your upload tool doesn't let you specify the encoding, get a better upload tool. (eg WinSCP)

Whichever encoding you use, as Dave mentions (+1), you'll need to URI-encode the non-ASCII bytes.

bobince
  • 528,062
  • 107
  • 651
  • 834