2

I understand that if a user needs to supply HTML code as part of a form input (e.g. in a textarea) then I use an Anti-Samy policy to filter out the hazardous HTML that's not permitted.

However, I have some text-fields and text-areas which should be text-only. No HTML code at all should be inserted into the DB from these fields.

I am trying to therefore sanitize the inputs so that only raw text is inserted into the database. I believe I can do this two ways:

  1. Use a Regex expression to filter out HTML code e.g. #REReplaceNoCase(FORM.InputField, "[^a-zA-Z\d\s:]", "", "ALL")#
  2. Use a strict text-only Anti-Samy policy

Which option is the correct/good-practice way to remove any user inputted HTML code from a textfield. Or are there further options available to me?

volume one
  • 6,800
  • 13
  • 67
  • 146
  • 1
    1) https://www.owasp.org/index.php/Don%E2%80%99t_Write_Your_Own_Security_Code:_The_OWASP_Enterprise_Security_API 2) well... you get the point. – Adam Cameron Jun 11 '15 at 22:08

1 Answers1

0

While you could use AntiSamy to do it, I don't know how sensible that would be. Kinda defeats the purpose of it's flexibility, I think. I'd be curious about the overhead, even if minimal, to running that as a filter over just a regex.

Personally I'd probably opt for the regex route in this scenario. Your example appears to only strip the brackets. Is that acceptable in your situation? (understandable if it was just an example) Perhaps use something like this:

reReplace(string, "<[^>]*>", "", "ALL");
Tony Junkes
  • 735
  • 8
  • 16
  • I'm not that great with Regex but I thought I was doing a negate. So basically I was asking CF to replace any character that is not a word-character (a-z) and is not a digit (0-9) and not a space (\s). Would it not work like I imagined? – volume one Jun 12 '15 at 09:35
  • I'm no pro either but your current regex removes the brackets and special characters only. Say you have a string of "hello"... You will get "bhellob". I made a gist example of your regex and mine here: https://gist.github.com/cfchef/3f8ecc2fae8b1272a11b You can run the code at cflive.net to see it in action or try this link http://trycf.com/editor/gist/3f8ecc2fae8b1272a11b/acf11 though TryCF.com seems to be down for me at this very moment. – Tony Junkes Jun 12 '15 at 09:58
  • So you want to filter out commas, apostrophes, etc? You sure you're not robbing Peter to pay Paul here? – Dan Bracuk Jun 12 '15 at 10:59
  • @DanBracuk Some of the inputs are product codes which really shouldn't have anything in them except letters and numbers. But I think Tony's regex is the correct one to use. – volume one Jun 12 '15 at 12:52