0

I have a tool which allows a user to upload a spreadsheet, and then I parse the spreadsheet with Laravel-Excel. My issue is, how can I check that the file is a valid excel file before attempting to parse it?

I looked in the PHPOffice/Laravel-Excel docs, and could not find a method for validating a file. So, my next guess was, if I attempt to Load() an invalid file, it will bomb out and give me a warning or error. However, rather than doing that, it will parse the file and try to somehow convert it to a spreadsheet. For example, I fed it a pdf and it did generate a collection containing whatever non-binary junk it could find in the pdf file. This is not desirable.

Currently, I am doing a mime-type check to validate the file.

//valid mime types

$mimeTypes = [
    'application/csv', 'application/excel',
    'application/vnd.ms-excel', 'application/vnd.msexcel',
    'text/csv', 'text/anytext', 'text/plain', 'text/x-c', 
    'text/comma-separated-values',
    'inode/x-empty',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
             ];
$file = request()->hasFile('file');
if ($file) {
    if (in_array(request()->file('file')->getClientMimeType(), $mimeTypes)) {
        //then parse the file
        Config::set('excel.import.heading', 'original');
        $data = Excel::load(request()->file('file')->path(), function ($reader) {
        })->get();
         //do some stuff with data...
    } else {
        //invalid file
    }
} else {
    //no file uploaded
}

This is not ideal, since there seems to be an exotic variety of possible mime types, so I would have to actively maintain the list, and certain csv files have a plaintext mime-type, so non-csv-plaintext files would pass muster here. Is there any standard way, provided by either Laravel, Laravel-Excel, or PHPOffice, to validate the file?

chiliNUT
  • 18,989
  • 14
  • 66
  • 106
  • don't chain method calls? If the load fails, you should check for that first, before trying to call ->get() on it. Can't find anything in the laravel excel docs about exceptions/return values, but at bare minimum, you could try `$e = Excel::load(...); var_dump($e)` and see if anything looks like an error message/code – Marc B Sep 09 '16 at 18:55
  • They look nearly the same, meaning they both produce a "valid" collection. I uploaded a `png`, and in the dump of `Excel::Load()` it interpreted it as a csv with 2 rows and 2 columns. Differences are that the values have numerical keys (they would typically be strings, column headings in the excel file) and the data in the cells is whitespace/control characters (junk) however I can't actually check for that until I run `get` because print_r is returning those values using reflection. (also var_export doesn't work because the structure is recursive) – chiliNUT Sep 09 '16 at 19:44
  • acutally, there are probably getter methods to get that data, but its still not great that I have to spot check the data returned, rather than just being able to check if there was an error or not – chiliNUT Sep 09 '16 at 19:59

1 Answers1

0

This is not a MIME check on the file!

You are only checking the user-supplied file extension information here.

string|null getClientMimeType()

Returns the file mime type.

The client mime type is extracted from the request from which the file was uploaded, so it should not be considered as a safe value.

For a trusted mime type, use getMimeType() instead (which guesses the mime type based on the file content).

Symfony Component HttpFoundation File UploadedFile

If you want to validate the magic MIME bytes in the uploaded file itself you can do this by relying on your operating system's magic MIME byte file like this in PHP.

$finfo = finfo_open(FILEINFO_MIME_TYPE); // return mime type ala mimetype extension
// $mime will contain the correct mime type
$mime = finfo_file($finfo, $_FILES['your-form-upload-input-name-here']['tmp_name'];
finfo_close($finfo);

See finfo_file for more details.

To do this specifically using the Symfony Component you're currently relying on call the getMimeType() method instead. Note the difference is that getClientMimeType() uses the client-supplied information which can't be trusted, whereas getMimeType() does the same thing as demonstrated with finfo_file() above.

Community
  • 1
  • 1
Sherif
  • 11,786
  • 3
  • 32
  • 57
  • So I'm using the wrong method to get mime type. Even if I use the "correct one" with `finfo_file`, this doesn't at all answer my question. – chiliNUT Feb 11 '18 at 02:18
  • I’m not sure how you arrived at that conclusion. My answer was crystal clear. Either you can use the ‘getMimeType’ method with your framework or you can use finfo_file. What you are doing right now is neither of those things. You’re calling getClientMimeType which doesn’t extract the Mime type from the file at all. It just gives you whatever Mime type the clients claims the file to be. – Sherif Feb 12 '18 at 16:43
  • My question was if there was a standard way to have Laravel Excel/PHP Excel validate the file. I showed my attempt at checking the mime type. You (*correctly*) pointed out that I was doing it wrong and provided the correct way to check the mime type. However, I feel that does not answer the question of *"how to validate if the file is a spreadsheet without going through an exhaustive list of mime types"* – chiliNUT Feb 12 '18 at 17:50