How to determine the encoding of request query string

Question

Suppose I have a .NET HttpModule that analyzes incoming requests to check for possible attacks like Sql Injection. Now suppose that a user of my application enters the following in a form field and submits it:

&#039&#032&#079&#082&#032&#049&#061&#049

That is Unicode for ' OR 1=1. So in the request I get something like:

http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049

Which in my HttpModule looks fine (no Sql Injection), but the server will correctly decode it to q=' OR 1=1 and my filter will fail.

So, my question is: Is there any way to know at that point what is the encoding used by the request query string, so I can decode it and detect the attack?

I guess the browser has to tell the server which encoding the request is in, so it can be correctly decoded. Or am I wrong?

score 1 · Answer 1 · answered Aug 14 '12 at 23:30

What you are seeing is URL Encoded, where a percent sign followed by 2 hex digits represents a single encoded byte octet. In HTML, an entity starting with an ampersand and ending with a semicolon contains an entity name or an explicit Unicode codepoint value.

What gets sent over the wire between the browser and server is http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049, but logically is actually represents http://example.com/?q=&#039&#032&#079&#082&#032&#049&#061&#049 when decoded by the server upon receiving it. When your code reads the query string, it should be receiving &#039&#032&#079&#082&#032&#049&#061&#049. The server should not be decoding that any further to ' OR 1=1, you would have to do that in your own code.

If you are allowing a URL query string to specify an SQL query filter as-is, then that is a mistake on your part to begin with. That suggests you are building SQL queries dynamically instead of using parameterized SQL queries or stored procedures, so you are leaving yourself open to SQL Injection attacks. You should not be using that. Parameterized SQL queries and stored procedure are not subject to injection attacks, so your clients should only be allowed to submit the indiviudal parameter values in the URL. Your server code can then extract the individual values from the URL query and pass them to the SQL parameters as needed. The SQL Engine will make sure the values are santitized and formatted to avoid attacks. You should not be handling that manually.

Thank you for your answer. I understand that allowing such code to be input in my application is a mistake. But actually what I'm trying to build is like a filter that checks the incoming requests for possible attacks, so suppose I have no control over the application. I want to be able to detect such code from the request and take action if it is a possible attack. — Forte L., Aug 15 '12 at 13:23
If the app does not allow SQL statements to be specified in the URL query to begin with, that eliminates any kind of SQL attack vector completely. Why can't you change that? That is a security hole in the app. — Remy Lebeau, Aug 15 '12 at 19:32
Maybe I haven't been clear enough. I apologize for that. Let me try again: I don't control the application. I'm building an HttpModule, which someone will install in their server to protect their applications. So I can't assume that the application has even basic security implemented. — Forte L., Aug 15 '12 at 20:10
OK, then. I explained what URL encoding is. The web server should be handling that portion for you, but if it is not then at least you have the details of how to decode it maually (replace each `%XX` string sequence with the corresponding `0xXX` byte octet). That just leaves decoding `XX` string sequences into character values (strip off ``, interpret `XXX` as a Unicode codepoint, convert to corresponding character). — Remy Lebeau, Aug 16 '12 at 18:45

score 1 · Accepted Answer · answered Aug 15 '12 at 23:24

the server will correctly decode it to q=' OR 1=1

It shouldn't. There is no valid reason(*) an application would HTML-decode the &#039... string before using it in an SQL query. HTML-decoding is a client-side occurrence.

(* there's the invalid reason: that the application author doesn't have the foggiest idea what they're doing, tries to write an input-HTML-escaping function - a misguided idea in the first place - and due to incompetence writes an input-de-escaping function instead... but that would be an unlikely case. Hopefully.)

Is there any way to know at that point what is the encoding used by the request query string

No. Some Web Application Firewalls attempt to get around this by applying every decoding scheme they can think of to the incoming data, and triggering if any of them match something suspicious, just in case the application happens to have an arbitrary decoder of that type sitting between the input and a vulnerable system.

This can result in a performance hit as well as increased false positives, and doubly so for the WAFs that try all possible combinations of two or more decoders. (eg is T1IrMQ a base-64-encoded, URL-encoded OR 1 SQL attack, or just a car numberplate?)

Quite how far you take this idea is a trade-off between how many potential attacks you catch and how much negative impact you have on real users of the app. There's no one 'correct' solution because ultimately you can never provide complete protection against app vulnerabilities in a layer outside the app (aka "WAFs don't work").

How to determine the encoding of request query string

2 Answers2