9

I'm trying to escape user generated content in Rails. I have used raw with sanitize and raw helpers to filter content like this:

raw(sanitize(code, :tags =>   ['<', 'h2','h3','p','br','ul','ol','li','code','pre','a'] ))

The list of tags mentioned are allowed in the content.

The problem is when I try to test it with a sql query like this:

mysql -u sat -p -h localhost database <  data.sql

inside pre and code blocks it removes everything after the less than (<) sign.

Please help me figure out a way to do this.

mansoor.khan
  • 2,309
  • 26
  • 39
  • I can't reproduce your problem in the console. Perhaps you should show the exact content you're passing to `sanitize`. Also, '<' isn't a tag, although including it in the list of allowed tags doesn't cause problems as far as I can tell. – Dave Schweisguth Apr 02 '16 at 13:14
  • This line: "
    mysql -u sat -p -h localhost database <  data.sql
    " will result in only this: "mysql -u sat -p -h localhost database" and the closing pre tag is also removed. That results in weird markup.
    – mansoor.khan Apr 02 '16 at 14:21
  • I am thinking it has to do with the way data is stored in the database. Can you suggest the best way to store and render source code? – mansoor.khan Apr 04 '16 at 11:59
  • Well, rendering is what you've been asking about, but as far as storage I don't see why you'd use anything more complicated than a sufficiently large string type. If your database is breaking your content, you ought to be able to detect that by comparing before and after. – Dave Schweisguth Apr 04 '16 at 13:17
  • No it isn't. It's just the rendering problem. I'm using raw and sanitize helpers: raw(sanitize(code, :tags => ['h2','h3','p','br','blockquote', 'ul','ol','li','strong', 'code','pre','a'] )). It works fine except where it encounters a '<', it breaks the following code. – mansoor.khan Apr 04 '16 at 13:24
  • Why the downvote? Please drop a comment to educate so that I'll be mindful in future. – mansoor.khan Apr 05 '16 at 04:32
  • Let me understand, you want to be able to save to the database something like "mysql -u sat -p -h localhost database < data.sql" within "
    " tags but rails strips everything in between if you use them?
    – Gustavo Rubio Apr 11 '16 at 18:26
  • @gustavo-rubio I was approaching it the wrong way. I've answered my own question with the details now. – mansoor.khan Apr 12 '16 at 04:13

6 Answers6

4

I don't believe this is possible using the default sanitize method within Rails.

Instead try using the Sanitize gem (https://github.com/rgrove/sanitize)

require 'sanitize'

allowed_elements = ['h2','h3','p','br','ul','ol','li','code','pre','a']
code             = "<pre>mysql -u sat -p -h localhost database < data.sql</pre>"

Sanitize.fragment(code, elements: allowed_elements)
# => <pre>mysql -u sat -p -h localhost database &lt; data.sql</pre>

To use this to save sanitized content to the database add a before_save filter to you model that runs sanitize on the user generated content and stores the result, e.g.

class MyModel < ActiveRecord::Base 
  ALLOWED_ELEMENTS = ['h2','h3','p','br','ul','ol','li','code','pre','a']

  before_save :sanitize_code

  private

  def sanitize_code
    self.code = Sanitize.fragment(code, elements: ALLOWED_ELEMENTS)
  end
end

When you output the content you just need to use the raw view helper e.g.

<%= raw @instance.code %>
Kieran Johnson
  • 1,363
  • 1
  • 11
  • 7
2

Rails 3 added the html_safe property for every String instance. Every string that is printed or inserted to the database will be escaped unless html_safe is set to true (simplified). What raw does, is actually set html_safe to true. So you should only pass a string that is already safe/escaped.

A possible solution could look something like this:

strip_tags(code).html_safe

You might have to add additional checks / string replacements depending on your use case.

According to your comment, you probably need a little more complex version. You could try to replace all chars that you would like to allow, sanitize the string, and then reverse the replacement in order to avoid that the sanitize method sanitizes more than you actually want. Try something like this:

code = "mysql -u sat -p -h localhost database < data.sql"

ALLOWED_SIGNS = {
  :lower_than => "<".html_safe
}

s = code.dup
ALLOWED_SIGNS.each { |k, v| s.sub!(v, "%{#{k}}") }
sanitize(s) % ALLOWED_SIGNS
Alex
  • 2,398
  • 1
  • 16
  • 30
  • The thing is I have to sanitize user input but also allow certain html tags as stated in the question. The problem is sanitize also removes less than sign '<' so I couldn't output SQL code as stated in the question. Do you understand now? – mansoor.khan Apr 06 '16 at 02:49
  • I've updated my answer and added some more example code which still uses the `sanitize` method. Thus you're able to add your allowed `tags` in the same way than in your question. – Alex Apr 07 '16 at 20:49
2

It seems like the whole issue was with the way data being stored in the database. Previously, a less than sign '<' was being saved as it is but now it is being escaped so a '<' would be saved as &lt; which seems to have solved the problem.

I was able to understand that accidentally while using tinymce-rails WYSIWYG editor which was escaping the '<' automatically.

@kieran-johnson's answer might have done the same but tinymce-rails solved it without installing an extra gem.

Thank you all of you who took out time to help.

mansoor.khan
  • 2,309
  • 26
  • 39
1

This might help, sanitizer has options to provide white list of tags and attributes needs to ignored during sanitization

ActionView::Base.full_sanitizer.sanitize(html_string) #Basic Syntax

White list of tags and attributes can be specified as bellow

ActionView::Base.full_sanitizer.sanitize(html_string, :tags => %w(img br p), :attributes => %w(src style))

Above statement allows tags: img, br and p and attributes : src and style.

Satishakumar Awati
  • 3,604
  • 1
  • 29
  • 50
  • 2
    How will you allow a less than "<" sign as mentioned in the mysql query in the question? – mansoor.khan Apr 07 '16 at 04:02
  • In one of my application this is how I allowed img, br, p elements and src ,style attributes. As '<' is not std html element sanitizer does not handle. you can try some work around like collect all '<' with position and delete from code, pass code to sanitize and then again add '<' in code. – Satishakumar Awati Apr 07 '16 at 09:27
  • That is exactly how I am allowing tags if you pay attention to my question description. If I could "try some work around", I wouldn't be here asking this question. – mansoor.khan Apr 07 '16 at 13:23
  • @IslamWazery exactly – Zia Ul Rehman Mughal Nov 23 '16 at 05:41
0

nokogiri gem solves the problem:

gem 'nokogiri'

Nokogiri::HTML::DocumentFragment.parse('<b>hi</b> x > 5').text
 => "hi x > 5" 
0

Consider replacing "<" with its ASCII character &#60; before running it through the sanitize method. It should get converted into &lt; and then render as "<" character, instead of the html.

Kirill
  • 3,667
  • 4
  • 30
  • 35