1

I'm currently working with many XML files and some of the data is encoded as below. How do I work with this data? Up until now I have been just simply using gsub() to replace the characters for blanks! Maybe there is an easier way.

Here is the description <br><br
>Here is some more text

I have been doing this:

gsub('&','')

Or this:

gsub('&','&')
curv
  • 3,796
  • 4
  • 33
  • 48
  • 2
    Your data is double-encoded. `<` has become `<` which has been re-encoded to become `&lt;`. How did your files become encoded in the first place? Are they stored in a database? – user229044 Feb 16 '11 at 22:17
  • They are 3rd party data feeds, unfortunately I have no control over it :( – curv Feb 16 '11 at 22:18
  • 4
    You need to [decode them](http://stackoverflow.com/questions/1600526/how-to-encode-decode-html-entities-in-ruby/1600584#1600584), twice. – user229044 Feb 16 '11 at 22:20

1 Answers1

2

I think you can use CGI.unescapeHTML to decode the data http://ruby-doc.org/stdlib/libdoc/cgi/rdoc/classes/CGI.html#M000096. I hope this would help.

jrichardlai
  • 3,317
  • 4
  • 21
  • 24