0

Get strange behaviour when one of two UPPERCASE Cyrillic symbols "И" and "Э" used inside url link:

file_get_contents("http://localhost/И")
fopen("http://localhost/И", "r")

Both return below error, but server even had not been called:

failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error

Anyone knows is this known issue? Is there bug reported?

Seems it is fixed in PHP8, but why this error happens?

PS. This is not about adding Headers to request (I tried) - call not even happens.

Update: Checked local nginx logs, and that really call server, and this is what I have for both symbols - php treat second part of unicode symbol as "_": enter image description here

"GET /\xD0_ HTTP/1.0" 500    <----  php7-  
"GET /\xD0\x98 HTTP/1.0" 404 <----  php8

Update 2: I found that not only these two symbols have such problem in PHP 7, but every symbol that ends either on "98" or on "ad" in hex code in UTF-8 table, here are examples of other symbols with the same behaviour:

file_get_contents("http://localhost/ϭ"); // cf ad
file_get_contents("http://localhost/Θ"); // ce 98
file_get_contents("http://localhost/Ҙ"); // d2 98
file_get_contents("http://localhost/ј"); // d1 98
file_get_contents("http://localhost/ѭ"); // d1 ad
file_get_contents("http://localhost/Ә"); // d3 ad
file_get_contents("http://localhost/‘"); // e2 80 98
file_get_contents("http://localhost/ĭ"); // c4 ad
file_get_contents("http://localhost/Ę"); // c4 98
nahab
  • 1,308
  • 17
  • 38
  • 1
    What happens when in your browser you got to `http://localhost/И`? – Chris Haas Apr 28 '21 at 21:46
  • 1
    Do you have set `ini_set('display_errors', '1'); error_reporting(E_ALL);`? If you do and the error is still 500 - have you checked the webserver's logs for what is actually causing the error ? :-) Also it's UTF-16, so it would not be a bug but rather a missing support :) Also check whether the URL is even working first as @ChrisHaas suggested. – jave.web Apr 28 '21 at 21:48
  • Out of curiosity what OS are you running? What happens if you change "r" to "rb"? – donatJ Apr 28 '21 at 21:53
  • `500 Internal Server Error` is a message generated by the server, so the call happens even if you think it doesn't. Look at the server error log for a more meaningful message. – Tangentially Perpendicular Apr 28 '21 at 22:03
  • @ChrisHaas in browser it works fine - returns files. As I said this is php problem itself, not network or another server side problem – nahab Apr 28 '21 at 22:11
  • @donatJ I am running on Win. Also. I've downloaded php 5.5.6, 7.3, 7.4 and php8 and only on php8 it works, both versions of php7 give this error, php5.5.6 fail too, for these two particular letters – nahab Apr 28 '21 at 22:15
  • Future developers will be very grateful if you edit your question and add the *text* of that nginx error as *text*, not an image. https://meta.stackoverflow.com/a/285557/6089612 – Don't Panic Apr 29 '21 at 06:03

1 Answers1

1

Because http://localhost/И is a malformed URL, you need to urlencode the path components that contain codepoints above 127. Your browser, and potentially some HTTP libs, do this transparently, but invoking URLs with file/stream functions in PHP definitely won't.

// because this is what I copy/pasted off of SO, which is UTF8
$in_8  = 'И';
// your endianness may vary
$in_16 = mb_convert_encoding($in_8, 'UTF-16LE', 'UTF-8');

$url_8  = 'http://example.com/'.urlencode($in_8);
$url_16 = 'http://example.com/'.urlencode($in_16);

var_dump(
    bin2hex($in_8),
    $url_8,
    bin2hex($in_16),
    $url_16
);

Output:

string(4) "d098"
string(25) "http://example.com/%D0%98"
string(4) "1804"
string(25) "http://example.com/%18%04"
Sammitch
  • 30,782
  • 7
  • 50
  • 77