1

This may get marked as duplicate, but in my defense, I've been searching around for a while, and a lot of the information I find is in relation to mysql or mysqli at best, or is incomplete. I want a thorough, up-to-date answer that factors in using PDO and prepared statements.

What is the proper way to handle data as it moves through an application.

Is the following theoretical flow of data adequate, and if not, what improvements would you recommend?

  1. proper form validation client side.
  2. Use of $_POST rather than $_GET
  3. In PHP, using $variable = htmlentities($_POST['variable']); just before database insertion.
  4. using PDO and prepared statements like: bindValue(':variable', $variable);
  5. On output, using echo htmlspecialchars($variable); to prevent XSS attacks.

Two related questions:

  • Lets say you're using htmlentities() on data before database insertion. How can you also remove all the garbage that is inserted if a user entered say <p>my input value</p>. This writes: &lt;tr&gt;&lt;p&gt;my input value&lt;/p&gt;&l to the database.
  • If your php is returning a JSON array handled by AJAX, how do you handle output in that scenario? This doesn't work in PHP: htmlspecialchars($JSON_Array)

Thanks in advance for your help on this.

hyphen
  • 957
  • 1
  • 11
  • 31
  • 1
    Step 3 is not needed and misguided. In step 5, `htmlspecialchars` is only applicable to HTML output - it could be generalized to "encode output in the way that is appropriate for the output context - `htmlspecialchars` for HTML, `json_encode` for JSON, XML Document methods for XML, etc.". – DCoder Jan 16 '14 at 13:05
  • Step 2 is perhaps questionable (the difference is semantic, not technical) but 3 is outright wrong. – Jon Jan 16 '14 at 13:06
  • @Dcoder 36 - what would be the appropriate way to handle step 3, or I guess I should say, what should step 3 be? – hyphen Jan 16 '14 at 13:44
  • There is no need for any step 3. Pass your data directly to PDO as bind variables, it will take care of safely storing it for you. – DCoder Jan 16 '14 at 14:21

1 Answers1

3

Generally speaking your flow will be similar to this:

  1. "Validate" data client side - you don't want to trust this validation since you should never trust anything coming from the client, this is done to make the user experience better.

  2. Validation on the server - make sure the data given to you is valid. Examples might be: validate type (int, string, etc.), validate value (users can't order a negative amount of an item), etc. If you're using some kind of MVC-ish framework this is done in the Model layer.

  3. Store the data in the database - you'll use prepared statements to protect yourself from SQL injection but you don't want to manipulate the data in any way (no htmlentities or the like).

  4. Whenever you're taking data out of the database that's when you decide if you need to convert HTML entities or do some other processing based on whether you're outputting HTML, JSON, XML, etc.

If you need to use htmlspecialchars or something like that on data in a JSON array, execute that before you put the data in the JSON array.

Benny Hill
  • 6,191
  • 4
  • 39
  • 59
  • In regards to #3. What is the most appropriate way to prevent someone from storing inappropriate data in my database? I've removed my htmlentities foolishness, but in this configuration, it would still allow someone to store html tags if they wanted to, and I would prefer not to allow them to do that. I can set up a regex client side, but assuming what you said above, that I don't want to trust client side data, how would I remove these in php? – hyphen Jan 17 '14 at 11:29
  • is strip_tags() the best way to do this? – hyphen Jan 17 '14 at 11:57
  • @hyphen For HTML filtering I like [HTML Purifier](http://htmlpurifier.org/). For other types of validation you can do regex in PHP on the server side as well, take a look at [preg_match](http://us3.php.net/preg_match) and related functions. – Benny Hill Jan 17 '14 at 14:01