6

I've got a server script receiving an uploaded file from Javascript.

Client-side, using a File object (from the W3C File API) and code similar to this line:

if (file.type.indexOf("text") == 0) { ... }

one can perform a check of the file type. Apparently, this uses a MIME type (which returns these strings).

In my journeys here through SO, I ventured across this worthy contributor, who maintains that MIME types are useless.

Are MIME types indeed basically useless in a file upload situation, and any type checking should therefore occur server-side?

Community
  • 1
  • 1
Ben
  • 54,723
  • 49
  • 178
  • 224

2 Answers2

5

That contributor maintains that all MIME type checking is useless, client or server-side.

And to some degree he's right. MIME type checking is always based on sniffing certain characteristics of a file. His example: a PDF file should start with something like %PDF-1.4. But a file that starts with %PDF-1.4 is not necessarily a PDF file. (Simplified explanation.)

A user can put all the right hints in all the right places so a MIME detector would detect the file as some specific type, because it's looking at those particular hints. But then the rest of the file could be something completely different. If you go that far though, what is it that makes a file of a certain type then? It's all just binary gobbledygook. In the end the only way you can make sure a file is a valid file of type X is by trying to open and parse it with a parser that expects files of type X. If it parses correctly, it's a file useful as type X. If it walks like a duck, quacks like a duck...

With that in mind, trying to parse the file is better than sniffing the MIME type server-side is better than sniffing the MIME-type client side is better than taking the user's word for what type of file it is. Note that client-side MIME type sniffing is just as unreliable as taking the user's word for anything, since it all happens client-side.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • When you say "trying to parse the file", you mean testing its duck-ness? And this must happen server-side, yes? – Ben Apr 26 '12 at 08:14
  • By "parsing a file" I mean *if this file is supposed to be a PDF file, let a PDF reader try to open it.* If the PDF reader comes back with "Dude, that really ain't no PDF here", then it's not a PDF file. The same goes for any other file type. JPEG files are supposed to contain data that is of use for a JPEG algorithm. If they contain anything else, it's either broken or not a JPEG file. – deceze Apr 26 '12 at 08:17
  • @deceze (8 years later), so you're basically saying 'check the extension and let it through', but then, you're wrongly assuming your end user has top-notch up-to-date system and software that won't execute malicious PDF/BMP content which is not always the case. if hes running windows XP and you let him download a virus bmp file, you share, at least, some of the blame, don't you? – Stavm Nov 21 '19 at 12:36
  • @Stavm I didn't say that at all. I said that checking the MIME type is *not as good as trying to parse it*, and both are better than doing anything client side. – deceze Nov 21 '19 at 12:40
4

The contributer is correct. You can't rely merely on MIME type checking to truly validate a file. It's only useful for quick lookups. For instance, on the client side, you can check the MIME type of a file before it is sent to the server, just in case the user chose the wrong file type, saving time and bandwidth. Apologies for the liberal use of commas!

Nadh
  • 6,987
  • 2
  • 21
  • 21
  • 1
    As far as I'm concerned there's not enough commas on the internets so a few extras won't hurt since too many people are writing runon trainwreck sentences like this one. – Ben Apr 26 '12 at 08:19
  • I was thinking something along these lines, but wasn't sure if I was all on my lonesome there. So, it seems the correct answer to my question would be, "good for efficiency, but useless without further validation." Which is what I suspected at the beginning :P Thanks fellas. – Ben Apr 26 '12 at 08:21