2

I am a PHP newbie and am working on a basic form validation script. I understand that input filtering and output escaping are both vital for security reasons. My question is whether or not the code I have written below is adequately secure? A few clarifying notes first.

  1. I understand there is a difference between sanitizing and validating. In the example field below, the field is plain text, so all I need to do is sanitize it.
  2. $clean['myfield'] is the value I would send to a MySQL database. I am using prepared statements for my database interaction.
  3. $html['myfield'] is the value I am sending back to the client so that when s/he submits the form with invalid/incomplete data, the sanitized fields that have data in them will be repopulated so they don't have to type everything in from scratch.

Here is the (slightly cleaned up) code:

$clean = array();
$html = array();
$_POST['fname'] = filter_var($_POST['fname'], FILTER_SANITIZE_STRING);
$clean['fname'] = $_POST['fname'];
$html['fname'] = htmlentities($clean['fname'], ENT_QUOTES, 'UTF-8');
if ($_POST['fname'] == "") {
    $formerrors .= 'Please enter a valid first name.<br/><br/>';
}
else {
    $formerrors .= 'Name is valid!<br/><br/>';
}

Thanks for your help!

~Jared

  • As a slight aside: Regarding output filtering, you cannot necessarily "clean" the variable for output, until you know *where* and *in which* output it will appear -- insertion into different parts of html/javascript require different rules; as does insertion into emails, PDFs etc. However, your approach will keep you safe from a majority of xss vectors into an HTML area of the document, where the browser is correctly handling multibyte chars - however I would never guarantee complete safety. You may find https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet useful. – Cheekysoft Jan 04 '12 at 09:46

3 Answers3

5

I understand that input filtering and output escaping are both vital for security reasons.

I'd say rather that output escaping is vital for security and correctness reasons, and input filtering is potentially-useful measure for defence-in-depth and to enforce specific application rules.

The input filtering step and the output escaping step are necessarily separate concerns, and cannot be combined into one step, not least because there are many different types of output escaping, and the right one has to be chosen for each output context (eg HTML-escaping in a page, URL-escaping to make a link, SQL-escaping, and so on).

Unfortunately PHP is traditionally very hazy on these issues and so offers a bunch of mixed-message functions that are likely to mislead you.

In the example field below, the field is plain text, so all I need to do is sanitize it.

Yes. Alas, FILTER_SANITIZE_STRING is not in any way a sane sanitiser. It completely removes some content (strip_tags, which is itself highly non-sensible) whilst HTML-escaping other content. eg quotes turn into &#34;. This is a nonsense.

Instead, for input sanitisation, look at:

  • checking it's a valid string for the encoding you're using (hopefully UTF-8; see eg this regex for that);

  • removing control characters, U+0000–U+001F and U+007F–U+009F. Allow the newline through only on deliberate multi-line text fields;

  • removing the characters that are not suitable for use in markup;

  • validating the input conforms to application requirements on a field-by-field basis, for data whose content model is more specific than arbitrary text strings. Although your escaping should handle a < character correctly, it's probably a good idea to get rid of it early in fields where it makes no sense to have one.

For the output escaping step I'd generally prefer htmlspecialchars() to htmlentities(), though your correct use of the UTF-8 argument stops the latter function breaking in the way it usually does.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • Thanks for the feedback. To check that the string is UTF-8 encoded I am using the Content-Type meta tag for my page to declare UTF-8 encoding. In addition to this, would `$_POST['myfield'] = iconv("UTF-8", "UTF-8//IGNORE", $_POST['myfield']);` help or is this pointless? For removal of control characters and characters unsuitable for use in markup, would I use a `preg_replace` function? Thanks again for the help. –  Jan 04 '12 at 02:37
  • Declaring charset in Content-Type is a good move, and will usually ensure that the browser submits content in UTF-8, but it doesn't prevent deliberately malformed byte sequences being submitted, so, yes, an input filtering step is necessary if you want to protect the few older browsers that permit overlong encodings. The `iconv` transform should work to ensure UTF-8; other possibilities include `mb_check_encoding`, and combining the UTF-8 check and the control character filter by doing both with a single `preg_replace` with the `/u` suffix. – bobince Jan 04 '12 at 10:18
  • OK. This is starting to make sense. So if I wanted to ensure UTF-8 encoding and allow only basic English punctuation, would a `preg_replace` such as this: `$clean['myfield'] = preg_replace("/[^A-Za-z0-9!#%&()-;:,.'?[:space:]]/u", "", $_POST['myfield'])` work? –  Jan 04 '12 at 17:17
1

Depending on what you want to secure, the filter you call might be overactive (see comments). Injectionwise you should be safe since you're using Prepared Statements (see this answer)

On a design note you might want to filter first, then check for empty values. Doing that you can shorten your code ;)

Community
  • 1
  • 1
  • Thanks for the reply. First, by "overactive" do you mean that it may catch too much valid data and strip it out? If so, what would be a better option? Second, I take it quotes _should_ be encoded, correct? In this case would FILTER_FLAG_ENCODE_HIGH do the trick in terms of preventing SQL injection? Last, thanks for the design note. I'll definitely do some code cleanup. –  Jan 03 '12 at 23:42
  • the FILTER_FLAG_ENCODE_HIGH encodes higher up characters, like the é, but those were already stripped by the function, just like linebreaks and such (see [link](http://www.php.net/manual/en/filter.filters.sanitize.php#103379)). That's why I said overactive, depending on what you want to sanitize... Concerning the SQL injection (silly me) you shouldn't worry too much if you use Prepared Statements (reason: [link](http://stackoverflow.com/a/60496/1068386)) – canihavesomecoffee Jan 03 '12 at 23:52
  • this is very helpful, thanks! One last question: in terms of escaping output, prepared statements will prevent SQL injection when the data is sent to the database, BUT in the event that the user submits the form with empty/invalid fields, I want to repopulate the form fields with sanitized data so they don't have to retype everything from scratch. In this case, can I repopulate the form fields with `$clean['fname']` or with `$html['fname']`(the problem with the latter being that all special characters are converted to HTML entities)? Thanks a bunch for your help. –  Jan 04 '12 at 01:03
  • @Jared You should test both scenarios and see what you want to use, depending on the things you want to allow or not :) – canihavesomecoffee Jan 04 '12 at 09:48
0

I understand that input filtering ... is vital for security reasons.

This is wrong statement.
Although it can be right in some circumstances, in such a generalised form it can do no good but false feeling of safety.

all I need to do is sanitize it.

There is no such thing like "general sanitizing". You have to understand each particular case and it's limitations. For example, for the database you need to use several different sanitization techniques, not one. While for the filenames it is going to be completely different one.

I am using prepared statements for my database interaction.

Thus, you should not touch the data at all. Just leave it as is.

Here is the (slightly cleaned up) code:

It seems there is some overkill in your code.
you are cleaning your HTML data twice while it is possible that you won't need it at all. and for some reason you are raising an error on success.

I'd make it rather this way

$formerrors = '';
if ($_POST['fname'] == "") {
    $formerrors .= 'Please enter a valid first name.<br/><br/>';
}

if (!$formerrors) {
  $html = array();
  foreach ($_POST as $key => $val) {
    $html[$key] = htmlspecialchars($val,ENT_QUOTES);
  }
}
Your Common Sense
  • 156,878
  • 40
  • 214
  • 345