I'm looking to do some rudimentary cleansing of HTML. Basically want to create a whitelist of tags that are allowed and reject anything else.
Is Hpricot worth it in this case? Does it have a feature that I've overlooked that will save me from rewriting the wheel? Or is it best to just write a whitelist of tags using regex and massage an HTML document through that?
Regex can get really tricky with HTML, and I know a lot of experts are strictly against it - I'm just looking for the path of least resistance.