What do you all think is the correct (read: most flexible, loosely coupled, most robust, etc.) way to make user input from the web safe for use in various parts of a web application? Obviously we can just use the respective sanitization functions for each context (database, display on screen, save on disk, etc.), but is there some general "pattern" for handling unsafe data and making it safe? Is there an established way to enforce treating it as unsafe unless it is properly made safe?
-
I'm specifically interested in making data safe for use within the application after it has already passed through a simple validation layer that checks the general format of the response but doesn't look for say, SQL injection or other threats. Basically ways to handle the data that will not result in significantly destructive behavior by the program. – Anonymous Aug 16 '09 at 06:42
-
I had in mind something along the lines of: User input might get wrapped in a sort of wrapper that prevents code from using it in unsafe ways. That input might then need to be unwrapped in a particular way that makes it difficult for someone to UNKNOWINGLY do something unsafe with it. The code using it would need to ensure it was made safe for use in the context of that code before using it. Is there something like this that has been established? – Anonymous Aug 16 '09 at 06:45
-
You mention at one point that you want something broad and flexible and at another point that you were imagining a framework of sorts. So you don't want a simple solution in the sense of "doing this one thing will scrub the input" so much as "doing this one thing will trigger 10 other things automatically that will scrub input", is that right? I'm not trying to be cheeky, I'm just unsure if you are looking for a method or a tool. More to come... – Anthony Aug 16 '09 at 10:57
-
The other thing confusing me is what you have in mind by secure. All lectures aside, your comment above is a bit fuzzy. You want something to wrap the input (which makes sense and is a good way of putting it) but who are you ultimately trying to protect by wrapping and unwrapping the data? Are you wanting to avoid SQL injection? Unintended queries? XSS? or malicious browser-side output? – Anthony Aug 16 '09 at 11:00
-
Anthony yes, you are correct that I'm not wanting "one thing to scrub the input" but rather "doing one thing will trigger 10 other things to automatically scrub the input". Which would depend on how the information is being used and so on. As far as what I'm trying to keep secure from, *all the above and anything else*. I was thinking there might be some established way for preventing data from being used improperly but providing facilities to let it be used properly. If not, then that's an answer to my question. – Anonymous Aug 16 '09 at 23:08
3 Answers
Like it's already been said, there are several things to take into account when you are concerned about web security. Here are some basic principals to take into account:
- Avoid direct input from users being integrated into queries and variables.
So this means don't have something like $variable = $_POST['user_input']
. For any situation like this, you are handing over too much control to the user. If the input affects some database query, always have whitelists to validate user input against. If the query is for a user name, validate against a list of good user names. Do NOT simply make a query with the user input dropped right in.
One (possible) exception is for a search string. In this case, you need to sanitize, simple as that.
- Avoid storing user input without sanitation.
If the user is creating a profile or uploading info for other users, you have to either have a white-list of what kind of data is acceptable, or strip out anything that could be malicious. This not only for your system's security, but for your other users (See next point.)
- NEVER output anything from a user to the browser without stripping it.
This is probably the most important thing that security consultants have emphasized to me. You can not simply rely on sanitizing input when it is received by the user. If you did not write the output yourself, always ensure that the output is innocuous by encoding any HTML characters or wrapping it in a <plaintext>
tag. It is simple negligence on the part of the developer if user A uploads a bit of javascript that harms any other users that view that page. You will sleep better at night knowing that any and all user output can do nothing but appear as text on all browsers.
- Never allow anyone but the user control the form.
XSS is easier than it should be and a real pain to cover in one paragraph. Simply put, whenever you create a form, you are giving users access to a script that will handle form data. If I steal someone's session or someone's cookie, I can now talk to the script as though I was on the form page. I know the type of data it expects and the variables names it will look for. I can simply pass it those variables as though I were the user and the script can't tell the difference.
The above is not a matter of sanitation but of user validation. My last point is directly related to this idea.
- Avoid using cookies for user validation or role validation.
If I can steal a user's cookie, I may be able to do more than make that one user have a bad day. If I notice the cookie has a value called "member", I can very easily change that value to "admin". Maybe it won't work, but for many scripts, I would have instant access to any admin-level info.
Simply put, there is not one easy way to secure a web form, but there are basic principals that simplify what you should be doing, and thus eases the stress of securing your scripts.
Once more for good measure:
- Sanitize all input
- Encode all output
- Validate any input used for execution against a strict whitelist
- Make sure the input is coming from the actual user
- Never make any user or role-based validation browser-side/user-modifiable
And never assume that any one person's list is exhaustive or perfect.

- 36,459
- 25
- 97
- 163
I'm more than a little sceptical that such a general purpose framework could both exist and be less complex than a programming language.
The definition of "safe" is so different between different layers
- Input field validation, numbers, dates, lists, postcodes, vehicle registrations
- Cross field validation
- Domain validation - is that a valid meter reading? Miss Jones used £300,000,000 electricty this month?
- Inter-request validation - are you really booking two transatlantic flights for yourself on the same day?
- Database consistency, foreign key validation
- SQL injection
Also consider the actions when violations are discovered.
- At the UI layer we almost certainly do not just quietly strip out non-digit chras from numberic fields, we raise UI error
- In the UI we probably want to validate all fields and flag each individual error
- in other layers we might throw an exception or intiate a business process
Perhaps I'm missing your vision? Have you seen anything that gets close to what you have in mind?

- 54,992
- 14
- 74
- 117
You cannot use a single method to sanitize data for all uses, but a good start is:
- Use Filter_Var to Validate/Sanitize
Filter Var takes a number of different types of data and strips out bad characters (like non-digits for things you expect to be numbers), and makes sure it is of valid format (IP Addresses).
Note: Email Addresses are far more complicated than the Filter_Var's implementation, so Google around for the proper function.
- Use mysql_real_escape_string before inputting anything into a Mysql Database
I wouldn't suggest using this until you are about to input stuff into a database, and it is probably better to just use prepared mysqli statements anyway.

- 60,743
- 20
- 130
- 150
-
I know there isn't a single function to sanitize for all contexts. As I was saying, "Obviously we can just use the respective sanitization functions for each context...". Filter_var does seem potentially useful. I guess I'm looking for a broader solution. Something very flexible. Maybe a sort of data handling framework. – Anonymous Aug 16 '09 at 03:01
-
Maybe you should look into an ORM like Doctrine? http://www.doctrine-project.org/ – camomileCase Aug 16 '09 at 05:16