32

Simple question that keeps bugging me.

Should I HTML encode user input right away and store the encoded contents in the database, or should I store the raw values and HTML encode when displaying?

Storing encoded data greatly reduces the risk of a developer forgetting to encode the data when it's being displayed. However, storing the encoded data will make datamining somewhat more cumbersome and it will take up a bit more space, even though that's usually a non-issue.

IAdapter
  • 62,595
  • 73
  • 179
  • 242
Mark S. Rasmussen
  • 34,696
  • 4
  • 39
  • 58

4 Answers4

30

i'd strongly suggest encoding information on the way out. storing raw data in the database is useful if you wish to change the way it's viewed at a certain point. the flow should be something similar to:

sanitize user input -> protect against sql injection -> db -> encode for display

think about a situation where you might want to display the information as an RSS feed instead. having to redo any HTML specific encoding before you re-display seems a bit silly. any development should always follow the "don't trust input" meme, whether that input is from a user or from the database.

Owen
  • 82,995
  • 21
  • 120
  • 115
  • 2
    How do subsequent queries work when you're doing a SELECT..WHERE and some of the values have HTML encoding and others don't? – DOK Oct 21 '08 at 21:04
  • ugh, sounds kinda messy. it really depends on your specifics, but if i inherited a project where i needed to create new views, and the info was half encoded, i'd probably re-store the information unencoded to make life easier in the long run. – Owen Oct 21 '08 at 21:06
  • To add onto this, if your encoding process for display is expensive (for example, you're allowing HTML and are running HTML Purifier on it), caching the filtered version can be an option. Disk space is cheap. – Edward Z. Yang Oct 21 '08 at 21:10
  • @Ambush Commander: if you accept HTML then it's a different problem: sanitation, not escaping. Your input is then in HTML and you don't have choice of (losslessly) storing as plain text or HTML. – Kornel Oct 21 '08 at 21:14
  • The distinction is true. However, I see far too many developers going the lossy method and storing filtered text in their database. – Edward Z. Yang Oct 21 '08 at 21:15
6

Keep in mind that you may need to access the database with something that doesn't understand HTML encoded text (e.g., a reporting tool). I agree that space is a non-issue, but IMHO, putting HTML encoding in the database moves knowledge of your view/front end into the lowest tier in the application, and that is a design mistake.

Craig Stuntz
  • 125,891
  • 12
  • 252
  • 273
  • agree! This is firstly ignored when ppl do to prevent XSS. – jack Nov 06 '10 at 17:58
  • can u please have a look at this [related question](http://stackoverflow.com/questions/22297015/should-i-save-in-db-user-input-as-html-encode) of mine ? – Royi Namir Mar 10 '14 at 10:31
6

The encoding should only only only be done in the display. Without exception.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
6

Output.

With HTML you can't simply check length of a string (& is 1 character, but strlen() will tell you 5), you can easily crop it (it could break entities).

You may need to mix strings from database with strings from another source, or read and write them back. Doing this application-wide without missing any escaping and avoiding double escaping is a nightmare.

PHP tried to do similar thing with magic_quotes and it turned out to be a huge failure. Don't take magic_entities route! :)

Kornel
  • 97,764
  • 37
  • 219
  • 309