0

I have a large web system written on top of WSGI that uses webob to access form data (no framework is involved). Randomly we'll get unhandled exceptions of UnicodeDecodeError from browsers (or bots) sending in undecodable escape sequences in the query string or POST data. I'm looking for a good default behavior that doesn't involve me getting an unhandled exception email.

My first idea would be to write a site-wide middleware that accesses the params of a webob request object with an exception handler that returns back a 400 (or maybe strips out the un-decodable data).

How do other systems/frameworks handle this?

Jeremy
  • 1,397
  • 2
  • 13
  • 20

1 Answers1

0

After some digging, I discovered that the .decode() method should be used on the request to create a decoded request at the very beginning. If this fails with a UnicodeDecodeError, I send back a 400. For example:

    try:
        req = webob.Request(environ).decode('ascii')
    except UnicodeDecodeError, e:
        return webob.Response(status=400, body="""
            <h1>Bad Request</h1>
            <p>We apologize. Your request includes characters the server
            cannot understand. Please click the back button and
            check your request for non-standard characters like accent
            marks and copy-paste data from word processing
            programs.</p>""")(environ, start_response)
Jeremy
  • 1,397
  • 2
  • 13
  • 20
  • yikes, i would hardly call accented characters "non-standard in the year of our lord 2013, when they exist even in english words ☺ – Eevee Aug 02 '15 at 23:24
  • I would not recommend using ASCII. The default for the web is latin-1, however many requests these days are utf-8. I would recommend trying utf-8, followed by latin-1, if that still fails then return an error response. – X-Istence Dec 29 '15 at 23:00