4

Currently I am trying to check with PHP if a file exists. The current file I am trying to check if it exists has an apostrophe in it, the file is called:13067-AP-03 A - Situation projetée.pdf.

The code I use to check if the file exist is:

$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';

if (file_exists($filename)) 
{
    echo "The file exists";
} else 
{
    echo "The file does not exist";
}

The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I continue to remove the é I get the message that the file does exist.

It looks that PHP somehow doesn't recognize the file if it has a apostrophe in it. I tried the following:

urlencode($filename);
addslashes($filename);
utf8_encode($filename);

None of which worked. I also tried:

setlocale(LC_ALL, "en_US.utf8");

Maybe worth noticing is that when I get the filename straight from PHP I get the following: 13067-AP-03 A - Situation projet�e.pdf

I have to do the following to have the filename displayed correctly:

$filename = iconv( "CP437", 'UTF-8', $filename);

I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.

For those who are interested, the script runs on a windows machine.

Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.

Now when I check to see if the file exists it shows the following filename that exists:

13067-AP-03 A - Situation projet�e.pdf

The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesnt interpet the � as an apostrophe.

Quartermain
  • 163
  • 3
  • 17
  • Something that might be relevant: on what server does your script fail? Is it a windows machine? Linux? – Lars Ebert Apr 22 '15 at 07:53
  • If i understand your problem correctly, then you do find the file if you are looking for 13067-AP-03 A - Situation projetee.pdf (without the accent). If that is the case you might try to run a string replace or regexp over the input strings before searching? – Burki Apr 22 '15 at 07:54
  • It works correctly on my machine. What's the encoding of your PHP script and what's the value of your `default_charset` setting in php.ini? – Frederick Zhang Apr 22 '15 at 07:57
  • Check if the php file itself is UTF-8 encoded as well. If it isn't then the filename in the php script will be different than in the filesystem. – Marius Apr 22 '15 at 07:58
  • Did you try the reverse `iconv` before calling `file_exists`? It looks like the filesystem functions expect 437 while your source file is UTF-8. – Jon Apr 22 '15 at 08:16
  • @LarsEbert, the script runs on a windows machine. – Quartermain Apr 22 '15 at 08:34
  • @Burki, How can I replace the filename of a file that my script doesn't find? – Quartermain Apr 22 '15 at 08:35
  • @FrederickZhang, the encoding of my PHP script is utf-8. The default_charset of php.ini is default_charset = utf-8. – Quartermain Apr 22 '15 at 08:37
  • Quartermain, maybe you should let us know how exactly you are searching for the fine. In the example code you gave, you provided a string and used it to search for a file. My comment suggested changing the string you provided, but from your question i gather your actual search works different? – Burki Apr 22 '15 at 08:52
  • If it is possible, I would try to enforce ASCII file names. – Martin Thoma Apr 22 '15 at 09:30

3 Answers3

1

I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.

According to https://bugs.php.net/bug.php?id=47096

So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.

My code page is CP932, which you can see yours by running chcp in cmd.

So the code is expected to be:

$filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
$filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
var_dump($filename);
var_dump(file_exists($filename));

But this won't work! Why? Because CP932 doesn't contain the character of é!

According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396

NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.

Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.

In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.

Frederick Zhang
  • 3,593
  • 4
  • 32
  • 54
  • By the way, I do suggest running PHP under Linux instead of Windows. There're too many weird problems of the Windows version of PHP. For example, the size of integer in Windows PHP is 32-bit even though you're using the 64-bit binaries. The problem is caused by msvc compiler and it once confused me for a long time. Some geeks managed to compile PHP by MinGW but it was too unstable to use. – Frederick Zhang Apr 22 '15 at 09:39
  • So there is no way around it? – Quartermain Apr 22 '15 at 09:39
  • @Quartermain I'm sorry that I cannot give you a resolution as far as I know. – Frederick Zhang Apr 22 '15 at 09:42
  • Atleast PHP recognizes the file now. I can use that to change the name of the file myself. Thank you. – Quartermain Apr 22 '15 at 09:45
-1

Make sure that your text editor is saving the file as "UTF-8 without BOM"

BOM is the Byte Order Mark, two bytes placed at the start of the file which allow software reading the file to determine if it has been saved as little-endian or big-endian, however the PHP interpreter cannot interpret these characters and so you must save the file without the byte order mark.

Zebra North
  • 11,412
  • 7
  • 37
  • 49
-2

Try this on start of your php file:

<?php
header('Content-Type: text/html; charset=utf-8');
?>
João Reis
  • 131
  • 9