I have pretty standard javascript/XHR drag-and-drop file upload code, and just came across an unfortunate real-world snag. I have a file on my (Win7) desktop called "TEST-é-TEST.txt". In Chrome (30.0.1599.69), it arrives at the server with filename in UTF-8, which works out fine. In Firefox (24.0), the filename seems mangled when it arrives at the server.
I didn't trust what Firebug/Chrome might be telling me about the encoding, so I examined the hex of the request packet. Everything else is the same except the non-ASCII character is indeed being encoded differently in the two browsers:
Chrome: C3 A9 (this is the expected UTF-8 for that character)
Firefox: EF BF BD (UTF-8 "replacement character"?!)
Is this a Firefox bug? I tried renaming the file, replacing the é with ó, and the Firefox hex was the same... so such a mangle really seems like a browser bug. (If Firefox were confusedly sending along ISO-8859-1, for example, without touching it, I'd see an E9 byte, and I could handle that on the server side, but it shouldn't mangle it!)
Regardless of the reason, is there something I can do on either the client or server sides to correct for this? If a replacement character is indeed being sent to the server, then it would seem unrecoverable there, so I almost certainly need to do it on the client side.
And yes, the page on which this code exists has charset=utf-8
, and Firefox confirms that it perceives the page as UTF-8 under View>Character Encoding.
Furthermore, if I dump the filename to console.log, it appears fine there--I guess it's just getting mangled in/after setRequestHeader("X-File-Name",file.name)
.
Finally, it would seem that the value passed to setRequestHeader() should be able to have code points up to U+00FF, so U+00E9 (é) and U+00F3 (ó) shouldn't cause a problem, though higher codes could trigger a SyntaxError: http://www.w3.org/TR/XMLHttpRequest2/#the-setrequestheader-method