Are fread(), file() and file_get_contents() functions secure to the server when third-party external references are given as filename?

Question

Introduction

I'm asking this question because all of my findings on this topic have answers focused on the risks of variable $filename values (which I already comprehend and know the string handling requirements for secure use). I also found a question about API secure use, but it ended with the OP asking for a cURL version of his code.

I'd like to know if the fread(), file() and file_get_contents() functions are secure by themselves when interacting with third-party websites throught streams with external $filename reference (i.e. external links such as https://www.example.com instead of internal file paths like includes/my-file.php).

Example

What would be the security issues - to the server - of the following simple application, which opens a given site content, replace all of its source-code "foobar" occurrences with "barbaz" and prints it to the user?

// User given URL.
$user_url = $_POST['url'] ?? null;

// Conditional content.
if (is_string($user_url) && preg_match('/^https?:\/\/.+$/', $user_url)) {
    // Print content with replacements.
    print_r(str_replace("foobar", "barbaz", file_get_contents($user_url)));
} else {
    // Warning.
    if (is_string($user_url)) echo '<p>' . "You inserted an invalid URL :(" . '</p>';
    // Form.
    echo '<form method="post" action="">' .
        '<label for="url-field">Your URL:</label>' .
        '<input id="url-field" name="url" type="text">' .
        '<button type="submit">Do it!</button>';
        '</form>';
}

Again, I am aware that preg_match('/^https?:\/\/.+$/', $user_url) is an oversimplified filter, and those other characteristics of the given string should have been analyzed as well (such as length).

More Details

My goal here is to understand if PHP's procedures used to obtain the file_get_contents(), file() or fread() content do have any known vulnerabilities that the consulted third-site could exploit and that represent a risk to the application server.

I'm not concerned about the returned string since a source code with malicious JS would not be a server-side problem and since I won't write stuff like eval(file_get_contents($url)) in my code.

Also, is it correct to affirm that none of the client HTTP request data (to my application) or cookies would be sent to the third-party website (unless explicitly done through a custom context)?

My confusion about those functions is partly due to the methodology difference between handling internal server file paths and external references. While the first makes them open the file and read its lines, the second requires internet protocols (I'm still a little inexperienced) to get the function's returned content.

These functions are secure on their own, but consider middleman attack — Justinas, Apr 29 '22 at 15:07
@ChrisHaas Those comments look very much like an answer to me, but nobody can vote, accept, or edit them because you posted them as comments. — IMSoP, Apr 29 '22 at 17:14

score 0 · Accepted Answer · answered Apr 29 '22 at 20:23

0

Most (maybe all) PHP file-based operations are done through streams, and each stream wrapper (file, http, ftp, etc.) has their own code and logic, which means each could also have potential security vulnerabilities.

The default stream wrappers and their corresponding handlers can be found in the source, and there's a great deep dive for implementors here, too.

You can manually register your own stream wrappers, too. In fact, you can also unregister existing wrappers, possibly core ones (I haven't tried), so you or someone else could inject vulnerable code in theory, too.

To the best of my knowledge, there are no publicly announced unpatched vulnerabilities related to these wrappers in the currently maintained versions of PHP. That's not to say that in the past there weren't, nor is it to say I know of any undisclosed ones.

To your second question, no, when a web browser visits a PHP page that uses file_get_contents against an HTTP/HTTP stream, nothing from the browser's initial request (headers, etc.) will be added to the stream's request. That part is called the "stream context". The default values for an unspecified context can be manually inspected in the source for each wrapper. Look for code like context && (tmpzval = php_stream_context_get_option and then find the corresponding else.

answered Apr 29 '22 at 20:23

Chris Haas

53,986
12
141
274

Your answer clearified a lot to me since now I know that the `file_get_contents` and the `fopen` functions rely on the `streamWrapper::stream_open` function to get their contents. I just couldn't find the native wrappers (`http`/`file`/etc.) class definitions, which I'd like to see to better understand what their `stream_open` functions do. I know that the `http` wrapper returns the HTTP response body as its content and that it fills the `$http_responde_headers` local variable with the HTTP response headers. – Rafael Apr 30 '22 at 16:49
I'm not insecure about HTTP responses security, but I don't know if PHP native `http` wrapper does something else with them in addition to return and fill variables with its content, for example. I even tried to naively clone the PHP repository and search for `class .*(http|HTTP|Http).*` and `stream_wrapper_register.*http.*` in VSCode. – Rafael Apr 30 '22 at 17:04
To be clear, they don’t rely on that class/method, that class doesn’t exist, per the documentation, it is just a prototype. You searching seems to be for PHP code, not C which is what PHP is written in. The code for the HTTP stream is here: https://github.com/php/php-src/blob/master/ext/standard/http_fopen_wrapper.c – Chris Haas May 02 '22 at 03:10
From what I could read (the code comments mostly since I have no experience with C) the wrapper really just makes HTTP requests and deals with the responses. If so, the `fopen`/`file_get_contents` security indeed relies on the user sent headers and response body content handling I guess. – Rafael May 02 '22 at 11:37

Are fread(), file() and file_get_contents() functions secure to the server when third-party external references are given as filename?

1 Answers1