3

I know that I can parse and render an HTML document with Kramdown in ruby using something like

require 'kramdown'

s = 'This is a _document_'
Kramdown::Document.new(s).to_html
# '<p>This is a <i>document</i></p>'

In this case, the string s may contain a full document in markdown syntax.

What I want to do, however, is to parse s assuming that it only contains span-level markdown syntax, and obtain the rendered html. In particular there should be no <p>, <blockquote>, or, e.g., <table> in the rendered html.

s = 'This is **only** a span-level string'
# .. ??? ...
# 'This is <b>only</b> a span-level string'

How can I do this?

Juan A. Navarro
  • 10,595
  • 6
  • 48
  • 52
  • So you want to strip out all block-level elements? This is the default behavior of kramdown. See http://kramdown.gettalong.org/options.html – Mark Thomas Aug 05 '14 at 14:04
  • That's also what I read, but the output still contains the `p`'s. Haven't figured out how to get kramdown to actually remove those. – Juan A. Navarro Aug 05 '14 at 14:14
  • It appears that option is for parsing raw HTML; it doesn't have an effect on the output. The output is not changeable, as they aim to be consistent with other Markdown implementations. You'll probably have to post-process. – Mark Thomas Aug 05 '14 at 15:03
  • 1
    you could just postprocess with nokogiri quite easily. – Mike H-R Aug 05 '14 at 15:03
  • Sanitize uses Nokogiri under the hood, it's even easier. – Mark Thomas Aug 05 '14 at 18:16

2 Answers2

2

I would post-process the output with the sanitize gem.

require 'sanitize'

html = Kramdown::Document.new(s).to_html
output = Sanitize.fragment(html, elements:['b','i','em'])

The elements are a whitelist of allowed tags, just add all the tags you want. The gem has a set of predefined whitelists, but none match exactly what you're looking for. (BTW, if you want a list of all the HTML5 elements allowed in a span, see the WHATWG's list of "phrasing content").

I know this wasn't tagged , but for the benefit of readers using Rails: use the built-in sanitize helper.

Mark Thomas
  • 37,131
  • 11
  • 74
  • 101
  • 1
    I would rather *not add* the additional markup than have it removed. But if there is no other simple solution, I might just do this. – Juan A. Navarro Aug 06 '14 at 07:57
  • For security purposes, whitelists are preferred over blacklists. This is particularly a concern if the content is end-user created and the application generates public pages. – Mark Thomas Aug 06 '14 at 14:04
  • Sure, I always keep that in mind. But, in my case, the content is created by myself, not an end-user. Sanitation (somewhat) does what I want as a side effect, but my end goal here is not sanitation. – Juan A. Navarro Aug 06 '14 at 14:09
1

You can create a custom parser, and empty its internal list of block-level parsers.

class Kramdown::Parser::SpanKramdown < Kramdown::Parser::Kramdown
  def initialize(source, options)
    super
    @block_parsers = []
  end
end

Then you can use it like this:

text = Kramdown::Document.new(text, :input => 'SpanKramdown').to_html

This should do what you want "the right way".

rr-
  • 14,303
  • 6
  • 45
  • 67