We were using a combination of the Sanitize gem and HTMLEntities to do some clean up of user input HTML. The Sanitize gem used Hpricot, but now uses Nokogiri. I need to get Hpricot out of the app.
Here are two test strings, each followed by the output I'm expecting:
Test string 1:
"SOME TEXT < '<span style='background-image: url(\"http://evil.ru/webbug.png\")'>MORE' & TEXT!!!</span>"
expected_text = "SOME TEXT < 'MORE' & TEXT!!!"
Second test string (a slightly different path):
'Support <i>odd</i> chars like " < \' ‽'
expected_text = 'Support <i>odd</i> chars like " < ' ‽'
Is this something you've solved? What tools did you use?