4

Nokogiri 1.5.0

I'm unable to output a parsed fragment with a link having query parameters, specifically with the ampersand in the href. The ampersand is replaced by its html entity.

f = Nokogiri::HTML.fragment(%q{<a href="http://example.com?this=1&that=2">Testing</a>})
f.to_s    # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"
f.to_html # => "<a href=\"http://example.com?this=1&amp;that=2\">Testing</a>"

No help using to_html(encoding: 'UTF-8') or US-ASCII.

This would seem pretty common, parsing a valid link format and wanting to render that back as valid HTML.

How to make Nokogiri transparently return un/encoded Html entities untouched? was no help.

Community
  • 1
  • 1
aceofspades
  • 7,568
  • 1
  • 35
  • 48

1 Answers1

5

Nokogiri's HTML parser automatically corrects errors in the source document. The naked ampersand in the URL is actually an error, so Nokogiri is correcting it. If you look at f.errors, you can see that it doesn't think that &that is a valid entity and is missing a semicolon, so it fixes the ampersand to &amp;, making it valid HTML.

John Douthat
  • 40,711
  • 10
  • 69
  • 66