Introduction
I'm asking this question because all of my findings on this topic have answers focused on the risks of variable $filename
values (which I already comprehend and know the string handling requirements for secure use). I also found a question about API secure use, but it ended with the OP asking for a cURL version of his code.
I'd like to know if the fread()
, file()
and file_get_contents()
functions are secure by themselves when interacting with third-party websites throught streams with external $filename
reference (i.e. external links such as https://www.example.com
instead of internal file paths like includes/my-file.php
).
Example
What would be the security issues - to the server - of the following simple application, which opens a given site content, replace all of its source-code "foobar" occurrences with "barbaz" and prints it to the user?
// User given URL.
$user_url = $_POST['url'] ?? null;
// Conditional content.
if (is_string($user_url) && preg_match('/^https?:\/\/.+$/', $user_url)) {
// Print content with replacements.
print_r(str_replace("foobar", "barbaz", file_get_contents($user_url)));
} else {
// Warning.
if (is_string($user_url)) echo '<p>' . "You inserted an invalid URL :(" . '</p>';
// Form.
echo '<form method="post" action="">' .
'<label for="url-field">Your URL:</label>' .
'<input id="url-field" name="url" type="text">' .
'<button type="submit">Do it!</button>';
'</form>';
}
Again, I am aware that preg_match('/^https?:\/\/.+$/', $user_url)
is an oversimplified filter, and those other characteristics of the given string should have been analyzed as well (such as length).
More Details
My goal here is to understand if PHP's procedures used to obtain the file_get_contents()
, file()
or fread()
content do have any known vulnerabilities that the consulted third-site could exploit and that represent a risk to the application server.
I'm not concerned about the returned string since a source code with malicious JS would not be a server-side problem and since I won't write stuff like eval(file_get_contents($url))
in my code.
Also, is it correct to affirm that none of the client HTTP request data (to my application) or cookies would be sent to the third-party website (unless explicitly done through a custom context)?
My confusion about those functions is partly due to the methodology difference between handling internal server file paths and external references. While the first makes them open the file and read its lines, the second requires internet protocols (I'm still a little inexperienced) to get the function's returned content.