7

I am using htmlspecialchars() function to prevent XSS attacks. I have doubt regarding what is the better method to store the data in database from following.

Method 1 : Store the user input values after applying htmlspecialchars() function. Using this it user input "<script>" will become "&lt;script&gt;" .

Method 2 : Store the user input as it is and apply htmlspecialchars() method while retrieving the data and displaying it on the page.

The reason for my doubt is that I believe using method 1 there will be overhead on database, while using method 2 data need to be converted again and again when requested through php. So I am not sure which one is better.

For more information, I am using htmlspecialchars($val, ENT_QUOTES, "UTF-8") so that will convert ' and " as well.

Please help me clear my doubt. Also provide explanation if possible.

Thanks.

Wesley van Opdorp
  • 14,888
  • 4
  • 41
  • 59
Vivek Vaghela
  • 1,075
  • 9
  • 16

4 Answers4

12
  1. Why do you expect that you will always use the data in an HTML context? "I <3 you" and "I &lt;3 you" is not the same data. Therefore, store the data as it's intended in the database. There's no reason to store it escaped.
  2. HTML escaping the data when and only when necessary gives you the confidence to know what you're doing. This:

    echo htmlspecialchars($data);
    

    is a lot better than:

    echo $data; // The data should already come escaped from the database.
                // I hope.
    
deceze
  • 510,633
  • 85
  • 743
  • 889
  • 4
    3. If there is a bug in the escaping function, how do fix the problem? Edit the entire database? – Erlend Mar 02 '12 at 03:38
  • @Erlend of course, but that's got nothing to do with this case in particular. If faulty data is written, you'll have to repair the content when you find out. That's true everywhere you write to a database (or to any document). – Mr Lister Mar 02 '12 at 07:49
  • @Mr Lister: It's very relevant to the question asked. If you store data uenescaped an escape on display you don't have to touch the database when the faulty escaping logic is discovered. Only change the escaping logic. In most cases this is simpler to do than fix the data in the database. – Erlend Mar 02 '12 at 16:44
7

An even better reason is that on truncating to fit a certain space you'll get stuck with abominations such as "&quo...". Resist the temptation to fiddle with your data more than the minimum required. If you're worried about reprocessing the data, cache it.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
4

My recommendation is to store the data in the database in its purest form. The only reason you want to convert it into &lt;script&gt; is because you'll need to display it in a HTML document later. But the database itself doesn't have a need to know about what you do with the data after you retrieve it.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
-1

As well as XSS attacks, shouldn't you also be worried about SQL injection attacks if you're putting user input into a database? In which case, you will want to escape the user input BEFORE putting it into the database anyway.

msgmash.com
  • 1,035
  • 5
  • 10