0

There is different content on site, which is allowed to be created/edited - news, articles, etc.

How to make correct and safe data transfer from editor to database?

I'd like to use wysiwyg editor, because potential users of this editor will be not such experienced users (Markdown and BB-code will be difficult for them, they want like in MS Word =) )

Also I'd like to add restrictions to this editor, for example: no images, only 5 colors, only 3 types of fonts, etc. (This can be done with limited controls of this editor)

My question: How to make this editor safer? How to prevent adding extra-html from user, or <script> tags. Do I have to make a html-filter of data came from database (saved content, that users wrote in editor) while rendering template page of this content (news or article)?

Should I store content in HTML-way in database? (If I want wysiwig-editor and it outputs HTML after saving). Or may be I should convert HTML from editor to bb-code or markdown (will all my limitations and restrictions) and clearing all extra-HTML... And then when getting content from database - I should convert bb-code/markdown to HTML again.

Or maybe there are easier and faster ways to making this safe?

Larry Foobar
  • 11,092
  • 15
  • 56
  • 89

1 Answers1

0

If you are populating the text into the innerHTML of lets say a div, it allows a user to write html and display it as HTML later. However, if you don't want to let people inject HTML you can use the innerText instead. innerText works just like innerHTML but does not hit the HTML parser.

If you plan on using bb code or markdown you would parse the text for the code that needs to be converted and leave the rest as text.

You could also use regex parser to convert special characters to the HTML code equivalent then the bb code or markdown to html


Try this:

When saving to the database: Replace known well formatted html with bb code replacing <b> with [b]. However ill formatted html will remain as typed <b > will stay <b >. Then do a regex replace on all HTML special characters ( ie < and > )

Then when retrieving from the database, you replace the bb code with html and you are all set.

Community
  • 1
  • 1
Utilitron
  • 402
  • 3
  • 10
  • Yeap, I understand the difference between innerHTML and innerText. But if I use innerText than how to display data which is keeped in HTML-way in database?... about html parsing using regex - I heard - this is bad idea... – Larry Foobar Jul 08 '11 at 13:30
  • What I mean by using regex to convert "special characters" is to replace any < and > with it's entity code. This will prevent any written html from being used as markup. This will allow you to use innerHTML without the fear of injected HTML. It is a 2 step process. Remove the unwanted < and > then parse the rest of the text and convert the bb code into html. – Utilitron Jul 08 '11 at 14:34
  • But with wysiwyg editor I save my content in html-way (I have `
    `, `` for bold, etc). Changing all < and > into their entity codes makes markup useless. I shouldn't change all < and >.
    – Larry Foobar Jul 08 '11 at 15:00
  • How about a tokenizer? http://en.wikipedia.org/wiki/Tokenizer#Tokenizer - If you know what you don't want the users to use, ie – Utilitron Jul 08 '11 at 15:31