How can I extract html escape chars/entities as text when scraping web? (ruby & nokogiri)

Question

In my ruby+mechanize(nokogiri) script I use this piece of code:

row.at_xpath('td[3]/div[1]/a/text()').to_s.strip

on a forum where the post title html looks like:

<a href="showthread.php?t=233891" >&lt;/body&gt; on Footer ?</a>

and I receive from xpath this string </body> on Footer ?

I would like to get what I can see in the web browser </body> on Footer ?

How can I do that for all html escape characters/entities?

score 1 · Accepted Answer · edited May 23 '17 at 12:19

1

Please take a look this post, to unescape htmlentities

or

There is a ruby package called htmlentities

edited May 23 '17 at 12:19

Community

answered Jan 23 '10 at 05:19

YOU

@S.Mark: I didn't know it was called entities (too). htmlentities works like a charm.Thank you. – Radek Jan 23 '10 at 06:36
i think its easier to with nokogiri itself a la http://stackoverflow.com/questions/2567029/how-to-make-nokogiri-transparently-return-un-encoded-html-entities-untouched – jm0 Jan 22 '14 at 21:48

1 Answers1