0

I have a situation where I receive HTML encoded information in our database column in some instance, in some other instances I receive regular text.

I have this code:

my_string = Nokogiri::HTML.parse(my_string).text

This works if my_string is HTML-encoded but does not work if it's regular text.

Is there a way I can perform the following check?

if html_encoded, Nokigiri::HTML.parse else just return my_string as is.

I am beginning to think Rails is handling this weirdly. Here is my model code:

  def show_name
    name = Nokogiri::HTML.parse(name).text
    name
  end

Here is my view code:

  <tbody>
    <% names.each do |t| %>
      <tr class="<%= return_cd_error?(t.show_return_cd) ? 'error' : '' %>">
        <td><%= t.name %></td>
      </tr>
    <% end %>

If I use binding.pry before the name, the name "John Doe" is returned "" before and after the parse, which is strange:

[2] pry(#<Test::Sess>)> name
=> "Hugh Geissler"
[3] pry(#<Test::Sess>)> name =  Nokogiri::HTML.parse(name).text
=> ""

However, if I remove the Nokogiri parse code, it displays fine.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
newbie
  • 1,023
  • 20
  • 46
  • Define `does not work if it's regular text` , what error or unexpected output do you have? – Yevgeniy Anfilofyev Oct 08 '15 at 15:41
  • We need a minimal example of the HTML you're seeing that demonstrates the problem. In other words, what is `my_string`? – the Tin Man Oct 08 '15 at 17:18
  • The html I ams seeing is: %B5463453593^TEST/GUEST L ^345353536563535 – newbie Oct 08 '15 at 17:34
  • It isn't necessary, or particularly desirable, to put in "EDIT" or "UPDATE" type markers in your question or answer. We can see what changed when if we need to. Instead, incorporate your change into the text as if you had originally entered it. – the Tin Man Oct 08 '15 at 18:21

1 Answers1

0

There is no definition for what is HTML encoded and what isn't in HTML. In XML there are ways to define embedded markup, but not in HTML.

Instead you can sniff the text and decide whether there are encoding prefixes, such as "&#", or, don't even care, and just decode it.

Nokogiri can decode encoded HTML, but it wouldn't be my first tool for decoding it. Instead, something like CGI::unescapeHTML(str), from Ruby's CGI standard library could do it. See "How do I encode/decode HTML entities in Ruby?" for more information.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303