3

I am using Nokogiri in Rails to parse my HTML and convert self-closing tags to regular ones. That works great, but it also converts our template tags which are [% and %], so for example:

html = "<a href='[% hello %]'>Hello from [% Us %]</a>"
Nokogiri::HTML::DocumentFragment.parse(html).to_html

will convert to:

<a href='%5B%%20hello%20%%5D'>Hello from [% Us %]</a>

How do I avoid it without using gsub after the conversion?

This did not help:

html = "<a href='[% hello %]'>Hello from [% Us %]</a>"
doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'
doc.fragment(html).to_html
#=> "<a href=\"%5B%%20hello%20%%5D\">Hello from [% Us %]</a>" 
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Ben
  • 2,957
  • 2
  • 27
  • 55
  • Possible duplicate of [How to make Nokogiri transparently return un/encoded Html entities untouched?](https://stackoverflow.com/questions/2567029/how-to-make-nokogiri-transparently-return-un-encoded-html-entities-untouched) – anothermh Oct 28 '19 at 21:46
  • @anothermh see update above. Referred solution does not seem to help. Maybe it's the `.to_html` ? If so, what's the right way? – Ben Oct 28 '19 at 21:52
  • Upvoted for scoping the problem and giving a test code that can be run easily . Can you please update the question with what is the expected output? – sameera207 Oct 28 '19 at 23:12
  • 1
    @Ben Try `doc.fragment(html).to_xml` => `"Hello from [% Us %]"`. Should work because it's XML, and XML won't necessarily know about _HTML_ entities. – anothermh Oct 28 '19 at 23:30

1 Answers1

2

@anothermh actually answered my question (see comments below my question). I ended up using his suggestion (to_xml)

However, I needed more out of the parsing of my code than I decided not to mention. I needed to be able to keep the special characters in the tags, but also convert self-closing tags to regular tags.

My solution was to use the XHTML format, described here: https://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Node/SaveOptions#FORMAT-constant

html = "... my html ..."    
doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'
final = doc.parse(html).to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML)
Ben
  • 2,957
  • 2
  • 27
  • 55