0

A third party is sending us a flat file that is supposed to contain exclusively printable ASCII characters. However, we've discovered that there's a string of about 50 0x00 bytes in the middle of the file.

We want to be able to upload the file to our web application, but I've discovered that Django doesn't seem to like the null characters in the multipart/form-data. If I remove the null characters, the upload succeeds. (Sorry I don't have the stack trace available at the moment, but will produce one if necessary)

We can pre-process the file to remove the null characters and/or work with our third party to fix their file generator, but I don't like to leave mystical problems like this.

Does this sound like a bug in Django or is there some aspect of multipart/form-data that I don't fully understand? Do I need to set a transfer encoding of some sort so Django doesn't get hung up on the null characters?

Joe Holloway
  • 28,320
  • 15
  • 82
  • 92
  • Null bytes work just fine, provided the MIME headers associated with the file specify the file data is using an encoding that can handle null characters correctly. – Remy Lebeau Dec 18 '09 at 20:26

1 Answers1

0

Nope, no transfer-encoding is needed (or ever used by browsers) on form-data. It's perfectly valid to include a run of 50 null bytes in a multipart/form-data value... indeed given that most binary files contain a lot of nulls that situation should arise as often as not with file uploads!

Which makes me question whether it's really a Django bug, or whether there's not something else going on. Let's have that stacktrace!

bobince
  • 528,062
  • 107
  • 651
  • 834
  • 1
    Not true. There is a "Content-Transfer-Encoding" header available, ie: "Content-Transfer-Encoding: binary". Or use "Content-Type: application/octet-stream" to send arbitrary data without interpretation. – Remy Lebeau Dec 18 '09 at 20:25
  • `Content-Transfer-Encoding` is not allowed in HTTP, see RFC2616 19.4.5. The encoding is always `binary` or an effective synonym. – bobince Dec 19 '09 at 03:16
  • 1
    @bobince: CTE is invalid for the global header of the whole HTTP transmission. However, according to [RFC 2388 Section 4.3](http://tools.ietf.org/html/rfc2388#section-4.3) it is perfectly valid for `multipart/form-data`. The way I read the spec, you should include `Content-Transfer-Encoding: binary` for every part that is non-ASCII, even if many actual implementations don't honor that requirement. – MvG Nov 23 '12 at 22:27
  • [RFC7578 sec 4.7](https://tools.ietf.org/html/rfc7578#section-4.7) (2015) mentions "Previously, it was recommended that senders use a `Content-Transfer-Encoding` encoding (such as `quoted-printable`) for each non-ASCII part of a `multipart/form-data` body because that would allow use in transports that only support a `7bit` encoding. This use is deprecated for use in contexts that support binary data such as HTTP. Senders SHOULD NOT generate any parts with a `Content-Transfer-Encoding` header field. // Currently, no deployed implementations that send such bodies have been discovered." – Nick T Apr 17 '18 at 20:57