13

Right now we're using the sanitize gem: https://github.com/rgrove/sanitize

Problem is if you enter "hello & world" sanitize is saving that in the DB as:

hello & world 

How can you whitelist the & . We want sanitize to remove all possible malicious html and JS/script tags. but we're ok allowing the ampersand.

Ideas? Thanks

AnApprentice
  • 108,152
  • 195
  • 629
  • 1,012
  • May be Sanitize.clean(html, Sanitize::Config::RELAXED) # => '&' – bilash.saha Nov 08 '11 at 19:19
  • Thanks but RELAXED allows just about everything. I'd like to whitelist & just can't find out how – AnApprentice Nov 08 '11 at 19:21
  • @bilash.saha The relaxed config will still html-escape entities, what you posted will still output "Hello & world" – Unixmonkey Nov 08 '11 at 19:33
  • Use a [`Loofah`](https://stackoverflow.com/questions/8055773/rails-gem-sanitize-how-to-whitelist/59215985#answer-59215985) - it's built in and perfect :) – SRack Dec 06 '19 at 15:29

5 Answers5

6

Sanitize will always transform what is output into html entities for valid html/xhtml.

The best way I can determine is filter the output

Sanitize.fragment("hello & world").gsub('&','&') #=> "Hello & world"
Unixmonkey
  • 18,485
  • 7
  • 55
  • 78
  • 2
    This would solve the & character, but would not scale to all the various characters that the html engine used will convert into entities. Trying to keep track of all that would be a headache as well. @agustin's answer below is a better solution IMO – Shyam Habarakada Mar 12 '14 at 15:51
  • 1
    @ShyamHabarakada The problem I have with Rails' built-in `sanitize()` and `strip_tags`, is that they don't correct malformed markup, so an unmatched `<` character can destroy the page layout. `strip_tags('Strip "Strip – Unixmonkey Mar 12 '14 at 18:32
  • True that, about malformed markup. We sanitize primarily as a way of preventing HTML from getting into params that should not have HTML. Stripping tags via the built in strip_tags works fine for us on that. It sounds like that's the scenario in this question as well. I agree, if you want full sanitization, a better solution that has a proper DOM engine is needed. But for param sanitization, IMO, that seems overkill. – Shyam Habarakada Mar 14 '14 at 23:23
3

Use the strip_tags() method instead.

http://api.rubyonrails.org/classes/ActionView/Helpers/SanitizeHelper.html#method-i-sanitize

Agustin
  • 1,254
  • 13
  • 10
2

None of the other answers worked for me. The best approach I've found for my use case was using the built in Loofah gem:

good = '&'
bad = "<script>alert('I am evil');</script>"
greater_than = '>' # << my use case

Loofah.fragment(good).text(encode_special_chars: false)
# => "&"
Loofah.fragment(greater_than).text(encode_special_chars: false)
# => ">"

Loofah.fragment(bad).text(encode_special_chars: false)
# => "alert('I am evil');"

# And just for clarity, without the option passed in:
Loofah.fragment(good).text
# => "&amp;"

It's not flawless though, so be incredibly careful:

really_bad = "&lt;script&gt;alert('I am evil');&lt;/script&gt;"
Loofah.fragment(really_bad).text(encode_special_chars: false)
# => "<script>alert('I am evil');</script>"

More info on the specified method here.

Definitely the most efficient approach for what I needed to do!

SRack
  • 11,495
  • 5
  • 47
  • 60
  • When you say "built in" what do you mean? Is it built into Rails? It looks like a gem just like sanitize which isn't what I'd consider built in. – Dan Jun 11 '21 at 16:38
  • 1
    It's included with Rails @Dan. `gem dependency loofah --reverse-dependencies` gives you (among others): `Used by rails-html-sanitizer-1.3.0 (loofah (~> 2.3))`. That Gem itself in built into Rails, see [here](https://github.com/rails/rails-html-sanitizer): "In Rails 4.2 and above this gem will be responsible for sanitizing HTML fragments in Rails applications". – SRack Jun 14 '21 at 09:58
2

UnixMonkey's answer is what we ended up doing.

def remove_markup(html_str)
    marked_up = Sanitize.clean html_str

    ESCAPE_SEQUENCES.each do |esc_seq, ascii_seq|
      marked_up = marked_up.gsub('&' + esc_seq + ';', ascii_seq.chr)
    end
    marked_up
  end

Where ESCAPE_SEQUENCES was an array of the characters we didn't want escaped.

Ashley Raiteri
  • 700
  • 8
  • 17
1

As of Rails 4.2, #strip_tags does not unencode HTML special chars

strip_tags("fun & co")
  => "fun &amp; co"

Otherwise you'd get the following:

strip_tags("&lt;script&gt;")
  => "<script>"

If you only want the ampersand I'd suggest filtering the output like @Unixmonkey suggested and keep it to & only

strip_tags("<bold>Hello & World</bold>").gsub(/&amp;/, "&")
  => "Hello & World"
Armando
  • 31
  • 2